Social identity management in social networks Diego Blanco, Jorge G. Sanz and Juan Pavón Dep. Ingeniería del Software e Inteligencia Artificial Universidad Complutense Madrid Ciudad Universitaria s/n, 28040 Madrid, Spain
[email protected],
[email protected],
[email protected]
Abstract. There is a lot of hype about social networks and related software within the concept Web 2.0 in the business and technical worlds. Given the fact that information is being widely spread in an exponential way with rooted dependencies among networks of social networks, we could find our digital identity exposed in ways that we could consider as inappropriate. Further than censorship, there is an arising need of knowing what is happening with the information we are delivering somewhere in the net, managing the distribution and the quality of detailed information. This paper discusses on which ways social identity management could be approached and further analyzed. Keywords: Social networks, social network identity, privacy.
1 Introduction Although social networks have been analyzed comprehensively in the last decade [1,2], in the last three or four years the outburst of social sites has gained momentum. Not only Facebook, Flickr, LinkedIn or tuenti have achieved a notorious impact in web community or media, but a hundred of new sites have appeared asking for its piece of cake [3]. Although its popularity started up in the visitor volume, the time or traffic spent in these sites turned out to be different of what up to the date used to be the standard. The revolution that this „new old brand‟ channel has started up offers a differential approach to knowledge management about users. People are connected by social links (friendship, colleagues, family, students, and so on) so that they can share a common set of interests. And those links have a strong dependency with confidence on the individual. People that everyone decides to relate to, are those who they want in their „inner circle‟. This pattern iterates up to the whole humankind (if they share the same kind of tools and exist in the same network) [12]. Groups are created in the social network around common interests as a means of widening the social connections. Those circles are exploited in different ways:
2
Diego Blanco, Jorge G. Sanz and Juan Pavón
Social activities Traditional o new wave marketing Targeted cross or up-selling The game starts when analysis is done within the network. The extent to which our presence in the social network can reach is unprecedented and uncontrollable. Who and how are people accessing our social network identity and data? How to establish suitable filters to what can and cannot be seen or accessed? Is there any way of establishing behavior management for our social identity? How to manage cross social networks bounds and links? How to harness the social engineering applied to social networks? In other words, future contractor should not be allowed to find photos of last private party in the Canary Islands. Which are the risks of disorganized social growth? Will we reach a saturation point? In the next sections a framework is established for analyzing the ways in which social identity should be managed. This paper is divided in 5 sections. This section introduces the social networks and defines the context of following analysis. Section 2, establishes the basic structure of information necessary to manage social identity within a social network. Section 3 reviews the paths of evolution of social identity in social networks from existing or inferred information. Section 4 analyzes the way in which information in the social networks affects social identity and a structure for managing it. In section 5 conclusions and future work are described.
2 Social network identity There are social network identity initiatives such as OpenID[13] where focus is placed in a unique login for different web sites, also known as Internet Single Sign-On. This initiative lacks all the semantics necessary to discover what is known about any user or managing that information. In order to cover this need, it is introduced the concept of Social Network Identity. Social Network Identity (denoted as SNI from now on) describes the potential information –what is known and what eventually could be guessed- about an individual within a social network. It is characterized by two different sets of attributes: Explicit attributes: All the information placed in the network on purpose about an individual. This is denoted as EA = {ea1, ea2, … , ean}, the set of explicit attributes provided in a certain Social Network Implicit attributes: All the information that could be inferred from the explicit attributes or from our behavior within the network. This is denoted as IA= {ia1, ia2, … , ian}, the set of attributes discovered. The SNI can be established as a function on these two sets. Nonetheless, social networks have a great grouping component. Being so, the SNI is partitioned for different g groups an individual belongs to. Explicit attributes imply what an indi-
Social identity management in social networks
3
vidual wants other individuals in the social network know about him/her. Filters can be defined so that access to his/her explicit attributes is limited [14], for instance, to the closer environment, any specific groups or concrete persons. Accessible information, denoted as AI, for different groups g, can be defined in terms of explicit attributes. The knowledge individuals in a certain group should get from us can be expressed as: (1) SNI can now be expressed as: (2) EA could be expressed now as: (3) Whereas EA can be dynamically modified and depends on what kind of information users want to expose, the implicit attributes are not affected by filters but are inferred from what user really does; this involves IA affects every AI g. There is an important privacy subject to be considered here. Privacy breaches will be found using the following function: (4) being sem a function that analyses the semantics behind an attribute.
3 Social network management: scenarios Social Identity Management becomes complex when value of (4) increases, thinning the line that separates what an individual wants the social network to know about him/her and what the social network effectively knows. For instance, an individual could be state explicitly that he does not like reading about sports (explicit attribute) but he could be spending all day long connected to see what is happening at The Football League (implicit attribute). The information that has been effectively gathered is useless here if it cannot be managed properly, avoiding privacy issues.
4
Diego Blanco, Jorge G. Sanz and Juan Pavón
SNI quality will depend on several premises: The more suitable the filters are for different groups, the better our digital identity will be managed. Function seen in (4) could be constrained to every AI (1), rewritten as fPriv(AIg). The more disjoint and non-interfering be EA and IA, the easier will be to manage SNI without violating privacy subjects SNI privacy considerations can be quantified using the following indicator. (5) By analyzing different scenarios derived from (5), the following graphic depicts the relation between cardinality of both sets EA and IA, and the relationship between their semantic intersections. The analyzed situation is based on both sets cardinality and its semantic commonalities.
Fig. 1. SNI indicator based on EA, IA and its semantic intersection
Social identity management in social networks
5
There will be Social Network Management desirable situations: Higher EA cardinality: implies a better controlled information placed in the social network Lower , where an individual exposed identity weights at least as much as the inferred one: implies a margin of SNI management capability without violating the expressed privacy. And also avoidable situations: Higher , where an individual exposed identity weights much less than the inferred one: implies a high risk of violating the expressed privacy. Under this viewpoint, digital identity would fall within one of the following groups: Unmanageable: EA are very similar to IA and the cardinality of semantic intersection is high. These SNIs present serious problems in order to manage relationships with them, having to manipulate a double morale. Migration paths: In order to stop having unmanageable SNI, several alternatives could be defined: o Behave as one has stated so that inferred attributed reduce its cardinality and thus reducing the semantic intersection cardinality. o Reducing the inferred attributes set. o A mix of both. Unmanageably active: In this SNI, IA is high compared with EA; this involves that someone do not want much to be known about himself directly but he participates actively so that inferred information to be found is high. These SNIs force to manage relationships poorly, although there is a lot of information that could be used to improve individuals situation. Migration paths: o Making EA grow up enriching individuals exposed attributes o Reducing the inferred attributes set. o A mix of both Partially active: In this SNI, neither IA nor EA cardinalities are high; the exposure in the social network is not very high. With a semantic intersection cardinality low, SNI could be managed properly without violating privacy. Migration paths: o Making EA grow up enriching individuals exposed attributes and keeping the IA set with a low cardinality to avoid SNI degradation. Manageable SNI: EA cardinality is high and interferences with IA are manageable. This could be due to a lower IA cardinality with many coincidences or with higher IA cardinality with few coincidences.
6
Diego Blanco, Jorge G. Sanz and Juan Pavón
Migration paths: o Behave as one has stated so that inferred attributed reduce and thus reducing the semantic intersection cardinality. o Reducing the inferred attributes set. o A mix of both Managed SNI: EA cardinality is high, IA cardinality is low with few coincidences between both sets. This is the ideal situation which minimizes the privacy risks and which simplifies the SNI management. SNI evolution under the described circumstances can be described using the following diagram:
Fig. 2. Social Network Management: SNI migration path
4 Managing SNI Evolving EA is simple as far as people are interested in sharing more knowledge about them. But what happens when they are not interested in sharing? Their behavior will speak for themselves although they do not want to. Information shared will do for them. How much does shared information affect the IA for an individual? This introduces another concept, the network impact value for information (denoted as NIVI from now on). NIVI is affected by: Information control (IC): Information is controllable or not. A friend could be asked to remove a photo without problems; but asking an aggressive opponent not to speak ill of us could be quite difficult. Information Relevance (IR): Information generated in a closer circle about us seems more relevant that thirds gossiping. IR depends on network topology and connections architecture. The higher the relevance is, the bigger will be the NIVI. Deviation from exposed attributes (DEA): Information which contains a semantics which is different from the exposed one will have a higher DEA value. The following formula describes the NIVI within SNI for a piece of information i published in some Node within the network:
Social identity management in social networks
7
(6)
NIVI describes the impact that information has on IA. Relevance and Deviation from stated behavior will increase the NIVI while a higher control about the information will allow us to minimize it.
4.1 DEA analysis DEA function handles the gap between EAs and IAs. Let us imagine a photo where someone is drunk with a couple of friends and one of them has uploaded it to a social network where all of them belong to or not. How many of that individual EAs are violated o new IAs defined? Information has to be decorated to get to this kind of guessing. Decorating can take different ways. For instance, pattern recognition could be used to guess whether his face is in the photo. Or someone could tag the photo including his name along with terms as his college, year, and so on. Inferred data generated could be overwhelming [15, 16]. Given the fact that DEA will depend on the semantic richness of different pieces of information, the volume of attributes that could be discovered should be managed to get a controlled domain of information to work with. Let us denote SR(i) as the semantic relevance of a piece of information. From the SNI point of view, all the information in the network should maintain a low SR(i) value, this is, be as semantically irrelevant as possible so that the impact on IA be minimized. A photo from somebody taken 20 years ago without additional tagging will probably have a low SR. The same photo, tagged with college, year of the photo and class name will have a semantic value because relationship of somebody with the photo could be guessed and further analyzed, so its SR will be higher. Imagine the scenario. Whether that person lives in a city with only one college, he/she could have kept in secret his/her college name, but the second photo could help infer a good amount of IAs: College, age (approximated), even a possible set of known people (other persons in the same photo). SR could be expressed as a function of all the relevant attributes that could be discovered. )
(7)
Denoting SK as a function reflecting the knowledge about an individual in the social network in terms of EA, DEA could be defined in terms of SK and SR as: (8)
8
Diego Blanco, Jorge G. Sanz and Juan Pavón
Or more generally for n attributes. (9)
4.2 Information control This parameter has to be analyzed in terms of social network metrics [2, 3]. IC depends on anyone‟s role in the network and the capability he is able to show so that information is managed. Lower values of Betweeness (people that can be reached indirectly from direct links) would increase our control of information, with a direct path to the information holders. Same would happen with Closeness, which establishes a measure of minimum paths to reach there were the information is relevant. Eigenvector centrality (EC) measures the importance of anybody as nodes in the network by means of assignment of relative scores to all nodes depending on connections. (10) IC will yield a higher value as our influence in the network increases, in reach, expanding our network value and in shortening the jumps to get where we wanted to. Finally, our node value in the network will normalize the control that can be exerted.
4.3 Information relevance This function ponders the effect of network architecture on the information. If the information is created by the most active and popular member in the network, its diffusion will have a greater impact about any other people than information generated on the limits of the net. For instance, it is much more relevant information communicated by Jesus in the Bible that the one coming from Jonah. Information relevance has to be analyzed in line with user network value (UNV) to establish the attention to be paid to it. (11) User network value (denoted as UNV) is a function which reflects an absolute value of a user in a network based on the participation and relevance in the same way as PageRank [17, 18] assigns a value to pages.
Social identity management in social networks
9
5 Conclusions In this paper a framework has been introduced for analyzing social identity from two main standpoints: evolution of social network identity, to align what is effectively known about individuals and what it is known but should not be used in interactions, and network information management, this is, try to harness the impact derived from information existence in the social networks and the insights that could be achieved. Computer Social network are bringing new mechanisms for information spreading due to its natural correspondence to human social networks that are somehow affecting the knowledge or even the control of what is out there about anybody. This framework intends to help in tracking and managing the social network identity, our digital blueprint, without limiting the inherent capabilities of the media. Future works will include the development of User Network Value function, so that information can be aligned with the people managing it, simulation of social network identity value generation and migration paths and a deeper analysis of network impact value for information. Acknowledgments This work has been performed in the context of the research project INGENIAS 2, TIN200508501-C03, which has been funded by Spanish Council on Research and Technology (with FEDER support).
References 1. John P. Scott. “Social Network Analysis: A Handbook”. Sage Publications Ltd; 2 nd Edition. (2000) 2. S. Wasserman, K. Faust. “Social Network Analysis: Methods and Applications”. Cambridge University Press. (1994) 3. An De Jonghe. “Social Networks Around The World: How is Web 2.0 Changing Your Daily Life?”. BookSurge Publishing. (2008) 4 Javier G. et al. “Personalización: más allá del CRM y el marketing relacional”. Pearson (2004) 5. Krebs, Valdis.. "The Social Life of Routers." Internet Protocol Journal, 3 (December): pp. 1425. (2000) 6. Thomas W. Valente. “Social network thresholds in the diffusion of innovations”. Social Networks, 18(January), pp. 69-89 (1996) 7. Thomas W. Valente. “How to search a social network”. Social Networks, 27(July), pp. 187203 (2005) 8. Danah M. Boyd, Nicole B. Ellison. “Social Network Sites: Definition, History, and Scholarship.” Journal of Computer-Mediated Communication, 13(October), pp. 210-230. (2007) 9. Roger Guimera, Marta Sales-Pardo And Luis A. N. Amaral. “Classes of complex networks defined by role-to-role connectivity profiles”. Nature physics (January). Pp 63-69. 2007 10. Ryan Bygge. “The cost of anti-social networks: Identity, Agents and neo-luddites”. http://www.firstmonday.org/issues/issue11_12/bigge/index.html (2006)
10
Diego Blanco, Jorge G. Sanz and Juan Pavón
11. Danah Boyd. "Identity Production in a Networked Culture: Why Youth Heart MySpace". http://www.danah.org/papers/AAAS2006.html American Association for the Advancement of Science (2006) 12. Travers, Jeffrey & Stanley Milgram. "An Experimental Study of the Small World Problem." Sociometry, Vol. 32, No. 4, pp. 425-443. (1969) 13. “Proyecto OpenID.net” http://openid.net (2008) 14. Mark S. Ackerman et al. “The Do-I-Care Agent: Effective Social Discovery and Filtering on the Web” 15. Wen-Syan Li, Y. Hara, N. Fix, S. Candan, K. Hirata, S. Mukherjea, "Brokerage architecture for stock photo industry," 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications, p. 91. (1997) 16. Katerina Pastra, Horacio Saggion, Yorick Wilks, "Intelligent Indexing of Crime Scene Photographs," IEEE Intelligent Systems, Vol. 18, No. 1, pp. 55-61. Jan/Feb, (2003) 17. L. Page, S. Brin, R. Motwani, and T. Winograd. “The PageRank citation ranking: Bringing order to the web.” Technical report, Stanford University, Stanford, CA. (1998) 18. S. Brin and L. Page. “The anatomy of a large-scale hypertextual Web search engine.” Proceedings of the Seventh International World Wide Web Conference. (1998)