Automatically Detecting Points of Interest and Social Networks from ...

Automatically Detecting Points of Interest and Social Networks from Tracking Positions of Avatars in a Virtual World Frank Kappe, Bilal Zaka, Michael Steurer Institute for Information Systems and Computer Media Graz University of Technology Graz, Austria {frank.kappe, bzaka, michael.steurer}@iicm.tugraz.at

Abstract With hundreds of millions of users already today, virtual worlds will become an important factor in tomorrow's media landscape. In a virtual world, users are represented by so-called avatars. These avatars move around the virtual world, communicate with each other, and interact with the virtual world. The movements of these avatars can be tracked precisely, and useful information can be inferred from analyzing these movements. In this paper, we analyze a large data set (>200 million records) of position data describing the movements of avatars in the virtual world Second Life. The dataset was derived from in-world sensors that had been deployed beforehand, but also so-called bots can be used to gather such information. From this data, we can track usage patterns of avatars (and therefore users) over time. We can also identify regions of high interest where a large number of users gather frequently (which would be important for planning advertising in the virtual world), and visualize this statistical analysis using heat maps. By combining the position data with information about the language spoken by the avatars, we can label these regions according to the language predominantly spoken there. Analyzing incidents of co-location of avatars over a period of time, we can automatically infer friends, and eventually social networks. Using additional metadata such as language we can label clusters in this automatically generated social network.

1. Introduction By common definition, a virtual world is a computer simulated environment where different users can reside and interact with each other. The user’s representation can be either textual or graphical (2D and 3D) and is termed as avatar. Virtual worlds are derived from networked games, introduced as early as the 1970s. With the advent of affordable 3D hardware, these gaming virtual worlds, also known as Massively Multiplayer Online Games (MMOGs), gained a lot of attention. MMOGs like World of Warcraft, Runescape, and Lord of the Rings are among

the most popular ones to date with millions of paying registered users. Recently, we have seen the development of true virtual worlds, collectively called the metaverse. The metaverse shares some characteristics of MMOGs, like the navigation in a simulated environment, multiple modes of interactions between users, etc., however there are few major differences as well. These differences include a lack of predefined user objectives in the virtual world environments, with an ability to define personal objectives or goals. In some virtual worlds, users can create, trade, and own digital content. The ability of content generation and ownership forms the basis of a virtual economy, where users can buy and sell virtual goods. In addition to social networking such virtual environments are also being used for education and business activities. This new medium provides very rich and diverse possibilities of personal learning and gives an alternate platform for enterprise presence. The virtual world Second Life (http://secondlife.com) has millions of registered users. However, there is a significant fraction of users who become inactive after failing to define proper goals and objectives in a virtual environment. For businesses, a reason for failure in virtual worlds as advertising medium is mainly because of the “nobody is there” problem. These issues arise because there is to date very little availability of knowledge about navigational trends, social and attention demographics. An organized representation of such information can help individuals and businesses establish a more stable and useful representation in virtual worlds. Virtual worlds also represent an opportunity to analyze a multitude of data about avatars, which are controlled by humans who remain anonymous. Some previous studies analyze data derived from virtual worlds. Friedman et al. use spatial analysis is to compare the dyadic interactions (interactions between two people) of the virtual world with the real world [1]. La and Michiardi use a spatial and temporal analysis of avatar position data to characterize the user mobility in virtual world. This was done in attempt to find a relationship with user mobility in real world [2]. In another study the interpersonal distance (IPD) and eye gazing dataof virtual

world users is analyzed to show that social interactions in online virtual environments follows similar social rules as in real world [3]. Although these studies highligts the significance of using virtual worlds to study human social interactions and mobility trends, they are restricted by a restricted dataset and analysis techniques (reliance on IPD in most cases). In our work we use a broader data collection of user activity and communication in Second Life for a mix of geospatial, proxemic and language analysis. This type of analysis provides essential marketing information about visitors in a particular location, such as their number, locations, duration, and other characteristics. The use of a social proximity detection algorithm and additional language information is shown to identify friend circles and social networks. The user attention overlays in form of heat maps allow businesses to concentrate efforts on the locations that are most visited by the users.

2. Data Retrieval A correct analysis of user’s behavior in a virtual world requires a statistically valid dataset of avatars, e.g. profile information, chat, and spatial information. In 2007, Friedman et al. presented a first approach for an automated data gathering robot to study the spatial social behavior of pairs of avatars in Second Life. This robot was an avatar controlled by software, with the ability to walk around and collect user information. The chosen path was randomly determined and therefore unsystematic which implies a lack of performance. Although the robot ran for a few days, it was only able to collect 205 samples of pairs of couples [1]. In this paper we introduce two improved methods of data mining in Second Life: static in–world sensors and an out–world interception of the Second Life client-server protocol. 2.1 In-World Sensors The Linden Scripting Language (LSL) provides mechanisms to detect and examine avatars within a specified area. The execution of scripts requires a virtual object, which implies that sensors can only be placed in regions with sufficient permissions to place (“rez”) the virtual object, i.e. on one’s own land or with assistance of the land owner [4]. Once a sensor is active, it can gather data 24 hours and 7 days a week without any maintenance. The sensor’s sampling period between two scans to detect avatars is a tradeoff between accuracy and a the amounts of processing time, bandwidth, and recorded data. Also, the amount of memory available to scripts is limited to 16kB. To meet all these tradeoffs and restrictions we have specified a fixed interval of 60 seconds between two data samples.

The main drawback of the sensor approach is the limits on the number of avatars a sensor can detect. In general, the sensing script can detect avatars within a radius of 96 meters, but the number of avatars returned by the sensor is limited to 16. Because the script has not enough internal memory available, the data returned from the sensor is immediately sent via HTTP to our central server on the Internet. This server stores the data in a database. Due to the multiple sensors there is a probability of data redundancy, e.g. two in–world sensors may detect the same event. Therefore, we apply a preprocessing step to remove all these redundancies before we can start our analysis. Over the last 12 months we have deployed nearly 6,000 sensors in over 1,200 different simulators, but not all simultaneously, because Second Life is evolving rapidly. At the beginning of January 2009, we had 442 active sensors in nearly 100 different simulators. A sampling rate of 1 per 60 seconds yields on average in 600,000 new samples per day which is about 18 Million samples per month. 2.2 Protocol Interception Still, the previously introduced in-world sensors have the drawback of a very limited viewing area. In November 2008, the virtual world of Second Life consisted of 1,871 km2 (http://2ndlife.com/whatis/economy_stats.php) which are about 28,500 simulators with a size of 256x256 meters each. Although we monitored 1,200 different simulators, this is only 4.21% of the entire virtual world. To get a rough overview of points of interest for in-world sensors we have to use a different approach: We can use the open source implementation of the Second Life client library libsecondlife (http://www.libsecondlife.org) to create a so-called bot, i.e. a remote-controlled avatar. We can instruct the bot to periodically visit simulators and detect user activities for the entire region. However, Second Life is designed as asymmetric client-server architecture and the authentication protocol to log into a simulator consists of eight steps [5]. On average, this protocol flow requires about 7 seconds for a simulator switch. If the login process was successful, the bot can immediately detect all avatars within the simulator, store the information in a database, and log in to the next simulator. With all the additional overhead the entire operation takes on average about 11 seconds, which yields in 8,000 processed simulators per 24 hours. Considering to iterate over 30,000 different simulators we have a sampling period of 90 hours per simulator. This approach gives only a rough overview of the avatar’s density in different simulators and therefore we exploit the gathered data for points of interests to place in-world sensors. Over a period of 12 months (from January 2008 to December 2008) we have collected over 230 million data

samples from over 600,000 different avatars. Every sample contains information about the avatar’s absolute position within the virtual world, the avatar’s velocity and direction of sight, and a time stamp. Besides the mentioned spatial data the sensors are capable to detect communication data of avatars. They can listen to public chat between avatars near the sensors and transmit it to the central web server for language classification. The presented data gathering tools were developed for Second Life but can be easily mapped to compatible environments like the OpenSimulator project (http://www.opensimulator.org).

based on the position data from our database. The map information is acquired at runtime from Second Life Map API server; this guaranties the up-to-date geographical imagery at multiple zoom levels of changing virtual world environment.

3. Analysis Techniques Virtual world environments change very rapidly; this change generates a huge amount of activity information. As already described in our previous section, there are very efficient ways to log these events. There are also a number of analysis schemes which are used to organize and explore data. These analysis schemes are applied to discover consistent patterns and systematic relationships among data entities. In our experiments, we applied the following types of contents analysis to our dataset.

3.1. Spatial Analysis Figure 1. Data schema

Due to the position-based nature of available user data, a spatial visualization system is the most appropriate choice of modeling data for better understanding of information. The recent developments in online GIS systems made it possible to blend (“mashup”) various types of information sources through the use of a service oriented architecture. The spatial element of our gathered record is transformed to the spatial domain and combined with the mapping API of Second Life. 3.1.1 Spatial Record Set. User position data is collected by sensors set for a full range (96m radius) spherical scan. These sensors detect nearby avatars and transmit the data to our server, where it is stored in a common database. The dataset linked to our online prototype consists of over 200 million position records from more than 600,000 users. MySQL 5.0 community server was used as database engine to store, process and access the given dataset. In the current setting, the database is using the default MyISAM based storage, because the conversion to InnoDB storage resulted in larger storage requirements and decreased system performance. For mapping of local position data relative to a simulator to global positions relative to the whole virtual world of second life, two additional data fields (global_x and global_y) were calculated and added in positions table. Figure 1 shows the resulting data schema we used in the analysis. In combination with the Second Life Map API [6], our system allows its user to generate nice-looking heat maps

An important aspect of knowledge discovery is to find out how events repeat in different user scenarios as time goes by. The spatial analysis system provides its user to apply different types of filters to provide such pattern detection. These filter include; i. Location based: Selection of arbitrary map area with dragging and zooming, selection of listed region (SIM) names, or regional name search. ii. User based: Through search and select feature one can choose single or multiple avatar IDs to be included in information overlay generation. iii. Temporal: User can eliminate noise by selecting a specific date/time range for analysis Filtering allows faster processing and the focus on a reduced set of locations, users and time frame. The primary method for exploring data is through an interactive map. Like almost all GIS systems the Second Life spatial analysis application treats longitude and latitude values (global_x and global_y in our case) as data elements on map. Traditionally, these data elements are presented on map as points through means of some image. These images (called markers in some cases) can be linked with popup windows displaying associated information. Web based public mapping APIs from Google, Yahoo, Microsoft provide the functionality of adding such information overlays. Since the Second Life Map API is also based on Google’s Map API, the

application inherits this ability of linking the heterogeneous dataset. This mode of representation is very intuitive, and thousands of applications are already online that make use of this approach of integrating the (geo-)spatial information space. However, in some cases, where the data elements to be presented on the map become large, this way of presentation has its limitations. Plotting thousands of similar elements on maps does not convey a meaningful visualization. Mapping APIs tends to slow down (even breakdown at a certain level) when these data elements (markers) increase above few thousand. In such cases, alternate approaches are considered to present the same information. One way is the use of traditional point images of different size or style to give an overview of information. One such example could be seen at the J.UCS mashup [7], where different size, color, and label markers are used to identify a number of data elements in a single point. However, such a representation looses accurate positioning information associated with the individual data elements, which in our case is very important. While the proximity of points in traditional image based maps obscures the view and hides some markers behind others, so-called heat maps successfully overcome these issues. In order to overcome the hurdle of presenting a large amount of user location information, we decided to use the heat map (or density map) approach. The hotspot or user attention information is rendered as a heat map on top of the map. The system also provides a conventional icon/pushpin based overlay to present sensor and linguistic data. The heat map draws attention to hotspots by rendering locations with larger user frequencies in “hot” colors, while displaying locations with smaller user frequencies in “cold” colors. The heat map is generated at run time as a single image corresponding to map area selection. 3.1.2 Generation of Heat Map. The used heat map generation algorithm is based on the work of Corunet [8], It uses heat maps to inform clients about which areas of their web pages receive the most clicks. The procedure they outline generates images on the server side, and then JavaScript is used to place the images on top of the web pages. The same algorithm is used to generate the heat maps for our Second Life map image with user position records instead of clicks. Hot locations on the map are colored with values towards the end of the array while cooler locations receive their colors from the beginning. The image shown in Figure 2 is a rendering of the heat map color scheme used in our system.

Figure 2. Heat map color scheme

The latitude and longitude values for each user position in the dataset are converted to map image coordinates, so the global position coordinates on the map have to be normalized to an image based value. 3.1.3 Results. In order to test the system, all regions covered by sensors were preprocessed for user activity and sensor distribution. This data contained 477 (the number of SIMs in the current dataset) animated GIF images named after each region’s name. The animated GIF consists of 3 frames showing the plain map, sensor distribution/coverage, and heat overlay derived from the user activity in that region. Figure 3 shows the frames of a single region’s preprocessed result.

Figure 3. Preprocessed result frames of “Berlin City” region

The hotter areas on the map are seen as possible points of interest. Figure 4 shows the live application interface (http://fiicmpc140.tu-graz.ac.at/sl), where different filters can be applied for more specific geospatial analysis.

Figure 4. User activity in selected region during a selected period

3.2. Language Analysis The sensors place in Second Life also have the ability to listen to the public chat between avatars. We used this functionality to classify the text “spoken” by the avatars according to its language using external language APIs. Google’s language API [9] was used to identify the language of a particular avatar. The detected language was further rooted down to a particular language family using a language family index available at Ethnologue

[10]. This language information then was stored in the avatar table, and used to classify regions in Second Life according to language spoken there. Figure 5 shows the use of linguistic analysis to identify dominating ethnic group of a particular region (German in this example).

Figure 6. Number of related avatar pairs per month

3.3.2 Refined Algorithm. In the previous section, we computed the score between avatars just by the time they spend together, ignoring distance. Figure 7 shows the distribution of the distance between the pairs of avatars found in the previous section, during their interaction. The average distance between two related avatars is 213 cm with a standard deviation of 79 cm. Figure 5. Sensor distribution and language data in selected regions

The extent of match among language of avatars at primary level or language family level also helps refine the social proximity scoring mechanism. The effect of language weights on proximity scores are further described in section 3.3.2.

3.3. Social Proximity Analysis Spatial distance measures are one aspect of the social distance, i.e. social proximity, between individuals [11]. The retrieved data as described in Section 2 can also be used to determine the social proximity and an according proximity score to build a social network of acquainted avatars. Inter personal distance used for social proximity analysis was calculated using the spatial position data coming from sensors. As mentioned above, the entire dataset consists of over 230 Million data samples from 600,000 different avatars. To reduce this vast amount of data we focus on one specific simulator for the following analysis which yields in about 12 Million data samples and 7,000 unique avatars. 3.3.1 Simple Proximity Algorithm. For this first approach we compare avatars with each other and count the number of incidents when two avatars share the same location (within a threshold distance) at the same time. A time threshold of 10 common minutes yields in approximately 10,000 pairs of 1,500 unique related avatars per month. Figure 6 states 2,000 pairs of avatars share at least 100 common minutes and about 40 pairs of avatars have more than 1,000 common minutes. For illustration, we use this set of avatars for all further computations.

Figure 7. Probability distribution of interacting avatars

Shearer et.al. [12] mentioned the increasing influence of closeness between avatars on the social proximity in virtual worlds. In our refined algorithm, we therefore compute the proximity score between two avatars i, j with respect to their spatial closeness dn for every common data sample n: (i) Basically, the equation sums up weighted spatial distances between the avatars. The exponential factor in (i) lets the weight vary between 2 for short distances and approximately 0.5 for a distance of 400 cm. Additionally, we introduced a language weight factor wl[i,j] based on the language analysis of the avatars from Section 3.2. The value of the factor varies between 0.5 for different language families, 1 for the same language family but different language, and 2 if the languages exactly match. The derived social network can be visualized as graph. All avatars are shown as nodes and the proximity between avatars is indicated by the distance in between these nodes. The smaller the distance between two nodes the

higher is the proximity score. As the entire social network in this graph representation consists of 1,500 distinct avatars, Figure 8 shows just a tiny cluster in the complete network with 28 avatars.

6. References [1] Friedman D., Steed A., and Slater M., “Spatial social behavior in second life”, Intelligent Virtual Agents, Lecture Notes in Computer Science, Volume 4722/2007, pp. 252-263, 2007, DOI: 10.1007/978-3-540-74997-4_23 [2] La C. A., Michiardi P., “Characterizing user mobility in second life”, In proceedings of ACM SIGCOMM workshop, 18/08/2008 , SEATTLE, WA, USA [3] Yee, N., Bailenson, J.N., Urbanek, M., Chang, F., Merget, D.: The unbearable likeness of being digital; the persistence of nonverbal social norms in online virtual environments. Cyberpsychology and Behavior 10, 115–121 [4] Brian White, “Second life®: a guide to your virtual world”, Que Corp, Indianapolis, USA, 2007 [5] S. Fernandes, C. Kamienski, D. Sadok, J. Moreira, and R. Antonello. “Traffic Analysis Beyond This World: the Case of Second Life”. In NOSSDAV, Illinois, USA, June 2007

Figure 8. Detail of the derived social network

4. Conclusions and Future Work In our work we showed the use of layered analysis approach. This approach aims to provide deeper insights into social interactions, user interest, and mobility patterns of virtual worlds. This type of analysis provides essential marketing information such as the attention, number and social characteristics of users in a particular location. In addition to generally used interpersonal distance data, the geospatial, temporal, and linguistic information was used to derive better social network and specific points of user interest. For future work it would be interesting to gather additional data for the validation of the derived results, and potential refinement of the algorithms and parameters used.

[6] Second Life Web Map API, Accessed 19 Jan. 2009, http://secondlife.com/developers/mapapi/ [7] Journal of Universal Computer Science, mapping mashup, Accessed 19 Jan. 2009, http://www.jucs.org/mashup/mashup.html [8] The Definitive Heatmap, Accessed 19 Jan, 2009, http://blog.corunet.com/english/the-definitive-heatmap [9] Google AJAX Language API, Accessed 19 Jan. 2009, http://code.google.com/apis/ajaxlanguage/ [10] Gordon, Raymond G., Jr. (ed.), 2005. Ethnologue: Languages of the World, Fifteenth edition. Dallas, Tex.: SIL International. Online version: http://www.ethnologue.com/.

5. Acknowledgements

[11] Reardon, S. F. and Firebaugh, G. (2002). Response: segregation and social distance – a generalized approach to segregation measurement. Sociological Methodology 32: 85– 101.

The authors wish to thank Wolfgang Halb and Patrick Höfler, who developed an early prototype of the social network analysis. Financial support from Styria Media AG is also gratefully acknowledged.

[12] Requirements of non-verbal communication in believable synthetic agents. Shearer, J., Olivier, P., Heslop, P. and de Boni, M. In: Proceedings of the AISB Symposium on Narrative AI and Games, 5th-6th April, 2006. Kovacs, T. and Marshall, J.A.R. (eds) Volume 3 pp. 62-69, 2006

Automatically Detecting Points of Interest and Social Networks from ...

Automatically Detecting Points of Interest and Social Networks from ...

Suggest Documents

Points of Interest (.pdf)

Points of Interest - Google Sites

ASKNet: Automatically Creating Semantic Knowledge Networks from ...

Detecting Fake News in Social Media Networks

Towards Detecting Compromised Accounts on Social Networks

Detecting Change in Longitudinal Social Networks - CiteSeerX

Automatically Detecting and Classifying Noises in ... - shiftleft.com

Automatically Detecting and Tracking Inconsistencies in Software ...

Automatically Detecting Mismatches during Component-Based and ...

Automatically Detecting and Organizing Documents ...

Detecting interest cache poisoning in sensor networks using an ...

Efficient Detection of Points of Interest from Georeferenced Visual ...

Recommendation of Points of Interest from User Generated ... - UEF

Recommendation of Points of Interest from User Generated ... - UEF

Automatically Identifying Periodic Social Events from ...

Automatically detecting problematic use of ... - Semantic Scholar

Detecting Stress Based on Social Interactions in Social Networks

Estimating Disaggregated Employment Size from Points-of-Interest ...

Joint Friendship and Interest Propagation in Social Networks

Segmentation Based Interest Points and ... - Semantic Scholar

Point-of-Interest Recommendation in Location- Based Social Networks ...

Exploiting Locality of Interest in Online Social Networks

Automatically Detecting Workflows in PubChem - SAGE Journals

IntScope: Automatically Detecting Integer Overflow ... - Google Sites