Entity Resolution in Online Multiple Networks (@Facebook and LinkedIn)
Social
Ms. Ravita Mishra
Abstract: The social network is a platform where the user can frame societal relations with people, friends, colleagues who claim same significance, actions and ,real-life connections. Social network is web-based services and it grants individuals to build their personal profile (public/private), and also maintain the list of users with whom to share connections, post, updates, and view. Today, over 2.9 billion individuals across the globe use the Internet and about 42% of these users actively participate on OSNs. Recent statistics list about 226 active OSNs in 2017. Each OSN (online social network) offers a distinct set of innovative services that easy access to information. The main aim of twitter is at particular instance, it retweet feature enables quick access to news, campaigns, and mess information, while pinboards of Pinterest provide facility to reach to the work of artists, photographers, and fashion designers to enjoy these services, accordingly and a user book herself/himself on multiple online social networks(OSNs). Research Center shows that 91% users registered themselves on both Twitter and Facebook; 52% users on Twitter, LinkedIn, and Instagram. During registration on any OSN, a user creates an identity for herself/himself listing personal information (profile) and connections and sharing. Since varying policy and purpose of the identity creation on each OSN, quality, quantity, and correctness of her identity vary with the Online Social Network. These give dissimilar existence of the same user and it is scattered across Internet, it has no accurate networks directing to one another. These unequal networks detach his/her from any privacy concerns. It emerges if the identities were implicative collated. However, desperate unlinked networks are a concern for various participants. The application of analysis/findings will be helpful in marketing and job recruitment, where the manager wants to check the employee profile on Facebook and LinkedIn. Facebook will give the detail of social activity and LinkedIn gives professional connection and their past experiences. The analysis/discovery can also use in the security domain, recommendation system human resource management and advertisement. The paper is organized in six parts, part I contains an introduction to the social network, part II contains Literature survey part III explains the limitation and research gap, part IV contains proposed methodology and block diagram part V contains conclusion and references.
Keywords:-Identity resolution, Identity search, Jaro distance, OSNs (Online Social Networks), Finding Nemo, Nudge Nemo. _____________________________________ Ms. Ravita Mishra Ramrao Adik Institute of Technology (RAIT) Nerul, Navi Mumbai-400706
[email protected]
1
2
Ms. Ravita Mishra
1. Introduction For information collection and making relationships, the user needs to make API calls, but each social network sets their limits on the number of calls permitted. This introduces the problem of limited information when the access to the full graph is limited and one can only access the graph through a limited number of API calls. These results are used by researchers and data scientists to usually repeat their analysis multiple times over time to improve accuracies [27, 28]. The main application of social network analysis is that users have been helping the industry in exciting ways to enrich user experience and service; it is also helpful for searching and linking user accounts that conform to the same individual in popular online social networks. A difficult task in social network analysis is finding the user behavior across multiple social networks [2]. The three important techniques to find the user behavior is Self-Identification, Self-Mention, Self-sensitive sharing. In Selfidentification, users explicitly mention their identities on other OSNs or on webpage using hyperlinks. In social network analysis user not only access their account if the only username set available, then have to add some other parameter like basic information and other description and also used private profile for analysis [10]. When a search parameter is not supported by the API then identity search fails and; it is challenging to retrieve candidate identities similar to a searched user identity on the mentioned parameter. The new methods need to be modified to start with a Facebook user identity and find the corresponding Twitter identity. The user identity linking method also fails sometimes.
2. Review of Research and Development in the field Entity/Identity Resolution issues can be categorized into different subways namely, Identity Search and Identity Matching. Many kinds of survey have described multiple identity resolutions and matching techniques to connect various networks in the real world but they fail to analyze identity search techniques for finding identical identities to their possibility. It is a crucial issue, which has been addressed by many researchers in the past (Getoor & Diehl (2005); Brizan & Tansel (2006); Elmagarmid et al. (2007); Benjelloun et al. (2009)). Omar Benjelloun, Hector et.al. [6], Here, the author explains the D-Swoosh algorithm, its main task to distribute the Entity Resolution workload across many systems. In this system genetic matching and merging functions are used and this algorithm ensures that the new merged records are distributed to all machines that may have matching records [1]. Here author performs a detailed analysis on a testbed of 15 processors, where application expertise can eliminate some comparisons and where all records must be matched. The author also discovered that applying domain knowledge is very difficult, as it needs a good circulation of records across syntactic groups.
Entity Resolution in Online Multiple Social Networks (@Facebook and LinkedIn) 3 Olga Peled, et. al. [18], here author introduces new methods for solving various Entity Resolution problems which are comparable users profiles across multiple OSNs [16]. Here supervised learning techniques (classification) are used to match two user profiles from two different OSNs; this method useses extracted features from each one of the user profiles. Here classifiers perform entity matching between two users’ profiles for the following circumstances: in first it matches entities across two Online Social Networks; second one search user attribute by same name; and third de-anonymizing a user’s network. The model was tested by collecting data from two popular OSNs, Facebook and Foursquare and it evaluates the attainment of the model. Author develops classifiers which uses multiple features: 1. username based features; 2. data-based features; and 3. topological based features. The developed method was evaluated using real-life data collected from two OSNs, Facebook and Foursquare. In this research Logit Boost algorithm seems to be the most appropriate algorithm and it solve the problem over all the categorical algorithms [18]. This method has different limitations, like the similarity between user finding, classification algorithm impacts problem and accuracy of the algorithm are not sufficient because of public profile. Azadeh Esfandyari, et. al.[11], here author suggest that the people open several accounts on diverse online social networks (OSNs) and enjoy different services, and people have lots of different types of profile information and these pieces of information from various sources can be achieved by identifying individuals persons profile across social networks. Here author addresses the problem because user identification process can be used as an allocation task. In this method common public attribute features can be accessed by using application programming interface (API). For building negative instances author proposes different methods and it uses the usual random selection to investigate the effectiveness of each method, the classifier requires training [35]. The effectiveness of this method is measured in real time by collecting different profiles from different OSNs like Google, Facebook, and Twitter. Cheng-Ta Chung, et. al. [17] this paper gives the basic idea about linking problem of same users across different social networks and it also provides the basic solution for comparing two different profiles of users. This method is further enhanced by seeing some increased societal association, i.e., friends. The author also proposes a twophase clustering (Hybrid Clustering) method and it generates a summary of each individual/ persons. In the first phase of the algorithm, it selects actively-connected groups that considered as seeds. In the other step of the algorithm, it elects non-seeds to the aggregates based on the profiles design of persons. The last dissemination come over various groups are regarded as the societal summary. The methods are tested on two real social networking datasets which is gathered by API, the analysis results are satisfactory in terms of feasibility of the recommended method, and the proposed approach also compares the two profiles, person-name attribute of different OSNs. Oana Goga [12], in this paper the author focuses on the privacy issues and security concern of individual. For security point of view, in the social connections the terms of service specify that users can create a single account. But, maximum user forms more than accounts and by end malignant user often act like honest users. In the Social network different users and their information is also are correlated to different
4
Ms. Ravita Mishra
site and gives the many account of single user. To solve this issue entity matching techniques require that identify the accounts of single persons. In this paper author also considered different parameter for evaluating the profile attribute namely; they are A-Availability, C-Consistency, I-Non-impersonality, D-Discriminability. These parameters are used to find the quality of different profile attributes to match accounts. In this paper author also used different matching schemes. Classify the multiple accounts that belong to a one person: - Mobile phones are good example of identification of single account one single person because the sensors can log almost all user activities, and it gives complete picture of persons. Sensors are used in many applications because sensors provide different monitoring services: such as applications to monitor sleep, heart rate or the number of steps we do each day [24]. Sensors used to exist in early days but their use has increased widely because more sensors will be surrounding us [12]. Using graph structure of social connection to boost the account matching: - The technique of de-anonymize graphs used here and the main idea behind such approaches is to start with a minor nodes that are considered as seed node, for this node know the completer corresponding accounts information on the two graphs, that is created by seed (u) and non-seed (v) node, lastly the seed node propagate the matching techniques it measures the similarity between the two graphs (u, v). Sybil detection using cross-site information: the current research are focused on how to evaluate the identity of the honest users, they verify the users identity/ details summary on different social network like Xing, Facebook, LinkedIn, or Google+ [19, 20].The available system detects only fake/untrustworthy identities rely on evaluating information about the integrity that are available on a single site or domain. Find which things accomplish a person popular across different sites: The person’s popularity is an important parameter to correlate different social networks. In OSNs Facebook, Twitter, and LinkedIn, users are mostly interested to build their accounts/profile more popular as compare to their friends [28, 30]. Users want to form profile more popular by investigating a number of questions. The answer of that questions will give a better understanding of what forms an account popular than other. Interpret how disseminate information from one social network to another: The main idea behind is to understand whether information distributed from one social network to another is authenticated or not [22]. Recommender with trigonous information: In this recommender system plays important roles for recommending the content inside in a system using the information gathered inside the particular system [24]. This concept is possible to know the users account details on different social networks user. W. M. et. al. [15], in this paper, The author provides a new technique of integrating information from multiple sites it creates a complete picture of user activities and their characteristics and trends. Here author proposes a different category of search that is profile, content, and graph. In profile-based methods, it considers approximate string matching techniques and in content-based methods, it performs author identification/identity and Finally, in graph-based methods, it applies new crossdomain community detection methods and it generates neighborhood-based features [26]. Limitation: - One issue for training the fusion models is the problem of
Entity Resolution in Online Multiple Social Networks (@Facebook and LinkedIn) 5 missing data. For example, the content features may be missing for some users due to an insufficient number of posted words. Kai Shuy, et. al. [13], in this paper, the author proposes the method that increases the reputation and dissimilarity of social media. The main purpose it to encourage the many people can enroll himself/herself on multiple online social networks and enjoy their benefit and services. At the time of registration user can generate their entity and it represent her/his unique public picture in every OSN. The main purpose of identity linking techniques applied in different domain such as recommendation system and link prediction. The data in the social network are wast and complicated and faces various challenge because the data have unique characteristics, to solve these problem new approaches are used such as: first extract different features and then create a predictive model for various perspectives.
Paridhi Jain, et. al. [2, 3], this paper explains how a single user can register himself/herself on multiple social networks to enjoy their different services. At the time of registration user creates their identity and that identity establish three main major dimensions they are like; profile, content and network [4]. User largely governs his/her identity framework on any social network and therefore can manipulate multiple aspects of it and no one can mark her presence uniquely in the online social network. In this paper literature has also proposed an identity search methods on the basis of profile attributes but has left the other identity dimensions content and network, are not explored. Author also introduces two unique identity search algorithms based on content and network attributes and it is improvement on the old identity search algorithm, A new identity resolution/search system called as Finding Nemo is also deploys, this system introduces identity search methods to find a Twitter user's identity on Facebook, and also proposes a new identity search algorithms which access public as well as private profile information of users.
3. Limitation of Existing System Limitation/Research Gap: - Entity resolution in online multiple social networks has various research gap and existing system have various limitations. Here problem can be solved by improved methods/ techniques [11]. The various limitations/research gaps are listed below: 3.1 Limited information: - Social graph have limited or less information, the third party individuals (researchers and data scientists) who would like to perform network analysis can only access the social graphs through a limited application programming interface (API) [7]. For e.g. in an intelligence agency they are collecting information about some suspicious person or terrorists then they want to merge the data for the same user across different online social networks to obtain a better picture of a possible person/terrorist. Social networks also allow the user to generate content and user-generated content is usually messy (contains typos, malformed input, bad GPS
6
Ms. Ravita Mishra
location for the check-in domain) and has lots of duplicates it also cause the problem of graph tracking (updating). 3.2 Crowdsourcing for ER: - In this scheme, machines first compute the probability that each pair of records refers to the same underlying real-world entity [21]. Next, ask humans to resolve record pairs and leverage transitivity to avoid asking humans to resolve every pair. 3.3 Imperfect Crowd: - We can use simple techniques such as majority voting to reduce human errors. However, the intermediate results will contain some unresolved record pairs because we have not reached an agreement for these pairs [21]. It would be useful to utilize these intermediate answers and make some inferences about record pairs. 3.4 Improving Crowd ER: - In the existing system machine can update its model based on each answer from the crowd worker and recalculate the probabilities of the record pairs that will not ask humans. We can use each human answer as a new training example and update the machine learning (ML) model based on the new example. We constantly recalculate the probabilities and reorder the pairs to ask humans; therefore, proving any theoretical bound on the number of questions would be non-trivial [21]. 3.5 Limited API: - Limited API introduces a new challenge where for each machine, we require an access token and to obtain an access token it takes some time and social networking system usually prevent users from creating many tokens [20]. To obtain an access token on Twitter, one needs to first create an account, then create an application, and finally create an access token [7]. Twitter may suspend the account if he/she has created too many tokens. 3.6 Identity linking Using username only: - History of other attributes like profile picture and description, that change more frequently than username, can be used further to link user profiles [21]. We could not do so because of non-availability of data of the respective attributes on other OSNs [2, 9]. 3.7 Identity search Dependency on API: - If a search parameter is not supported by the API, it is challenging to retrieve candidate identities similar to a searched user identity on the mentioned parameter [5]. Therefore, the methods are asymmetric i.e. the methods need to be modified to origin with a Facebook user integrity and find the identical Twitter identity. 3.8 Evaluation of self-identified users: - Truth datasets of real-world users, used for evaluation of identity search and linking methods, contain those users who explicitly self-identify their identities on multiple social networks [12]. A validated dataset of users and their identities across OSNs who do not explicitly identify their own accounts is challenging to gather. Therefore, applicability and performance of our methods on non self-identified users are difficult to examine.
Entity Resolution in Online Multiple Social Networks (@Facebook and LinkedIn) 7 Table 1 Data Extracted (Attribute) From Facebook and LinkedIn Facebook UserID Username, Name Profile Picture Location Gender Birthday Hometown City Languages Known Friends Religion views Political views Favorite quotes Relationship status Websites/Webmails Networks Relatives Primary Education (Primary school/collage name, class, type of degree) Professional qualification/Experience (Employer name, position, title, start date, end date, description)
LinkedIn ID Type( Passport, Identity card, Driving Licence) Name Profile Picture Company Name, Company location Gender Birthday Location City Experience Skill Past Experience Region Preferences Language Professional Skills Groups Connections Recommendation Educational Qualification Professional Skills (Employer name, position, description )
Facebook always allows the user to first register themselves and then create a user profile, upload photographs, video, sending messages and keeps in touch with friends, family, and colleagues [29, 30]. On the other hand, LinkedIn is totally analogous from the Facebook, The main task of LinkedIn it is maintain professional figure, network and brand it also helpful in finding a job, networking, endorsement, selecting new employees, getting sales leads and even getting employability news, marketing etc. [31,32,33]. Main aim is to collect common attributes of Facebook and LinkedIn which is helpful in identity matching.
4. Objective/ Methodology of Proposed system 4.1 Objective/Methodology In Entity resolution various problems discussed above. It is difficult to find same entity on multiple social networks (Facebook and LinkedIn); our main objective is to
8
Ms. Ravita Mishra
find an identity on multiple social networks in a distributed system[16]. In available system sharing budget and privacy policy is also a big challenge. We are trying to solve that problem by using separate API; the main objectives of the proposed system are listed below [17, 18, and 25]: 4.1.1 ER on Multiple Social Networks: - The problem of matching users between two social networks (Google+ and Twitter) [19]. Our objective would be to obtain a holistic picture of each person's identity by unifying his/her information from multiple social networks. For example, say we have three social networks A, B, and C. After we have matched some users between A and B, we can define a unified graph D that aggregates information from A and B. We can use D to help us match users between A and C and between B and C [20]. We update D to reflect new matching pairs as we match more users between different networks. Then we can go back and attempt to match more users between A and B again. We are also finding the person's online footprint (web search results, personal websites, and online profiles). For example, we could use DataMiner to scrape and parse LinkedIn People Search results to extract full name, job title, company, and location. In addition, we could augment the online footprint to help resolve the person's identity [8]. 4.1.2 Shared Budget: -For each API operation we have different budget values. For example, every 15 minutes Twitter gives each user a budget of 15 API calls for each of the following operations: inlinks, outlinks, and friendships. In this case, the optimization is simply which node to make the API call on. For example, operation relationships require two nodes to be passed as parameters [19]. We have a budget of 15 relationships calls; we decide which node pairs to make these calls. Another type of budget is a single budget shared among API operations. In this case, split the budget among the operations. The user decides not only which node but which operation will make an API call that will yield the highest information gain as we probe the graph. 4.1.3 Distributed ER: - In distributed ER, we have multiple machines are used here we assume that we have full access to the first graph (G), but limited access to the second graph (T). We will continue using this problem set. We will partition the first graph (G) and split the work for probing the second graph (T) to different machines. Each machine gets a partition of G and performs ER on the partition. We would like to find matching nodes for a number of G nodes in T [8, 19]. The node correspondences learned by a machine may be useful to another machine that shares some nodes and edges [8]. Thus, we can share this info among machines to help resolve more nodes. At the end, we combine the results from all machines. We can utilize and adapt graph partition techniques to split the ER task into different machines. In the proposed system first block is the extraction of data from different social network sites (Facebook and LinkedIn), next block preprocess/clean the data and store in a flat file. The third block performs the identity search for the first site and it performs the attribute matching and creates candidate, next block extracts the
Entity Resolution in Online Multiple Social Networks (@Facebook and LinkedIn) 9 shortlisted candidate. Next, it performs content matching of the shortlisted candidate. Final block gives the matched Facebook user identity [17]. 4.2 The architecture of proposed system: The proposed system consists the following blocks: 4.2.1 Data Extraction:- The data will be gathered by the API from the online social network 4.2.2 Data Cleaning/Preprocessing:-The data obtained will be cleaned by removal of any missing or irrelevant data. 4.2.3 Data Storage: - The Data collected will be stored in a flat file. 4.2.4 Methodology: - Data dictionary will contain the most used words on social media sites. The data dictionary can be used to get more keywords from the content of the user [33, 34].
Extract data from Social network 1 (Facebook)
Clean and Process the data and store in flat file
Extract data from LinkedIn and identify user
Apply contents matching algorithm to shortlisted candidate (user)
Extract the matched contents of shortlisted Candidate key
Apply attribute matching algorithm and generate candidate key
Display the matched face book user
Fig. 1 Block Diagram gives the brief idea of Proposed System (Extracting data and displaying matched entity)
4.3 Collective Clustering Algorithm: - Collective clustering is used to match different attributes from the different social network and solve identity resolution
10
Ms. Ravita Mishra
problem [5]. This method follows a greedy approach and agglomerative clustering algorithm and its main purpose is to find the most matched/similar clusters and then merge that values. For finding cluster similarity it requires clusters of reference [26, 35]. In the available system, each design cluster is expressed as the same real-world stuff [9, 36]. Comparing to other technique like transitive closure uses single-linkage clustering, but collective clustering uses an average linkage approach and it also defines the relationship between two different group’s ci and cj as the average similarity between each reference in ci and each reference in cj: the similarities between these two clusters are expressed by the equation: 𝑠𝑖𝑚 (𝑐𝑖, 𝑐𝑗) = 1|𝑐𝑖. 𝑅| × |𝑐𝑗. 𝑅|∑𝑟 ∈ 𝑐𝑗. 𝑅. 𝑟 ´ ∈ 𝑐𝑗. 𝑅𝑠𝑖𝑚(𝑐𝑗. 𝑟, 𝑐𝑗. 𝑟 ´)
(1)
Here |c.R| represents the number of implication in cluster c. Pseudo Code for Collective Clustering Input: Reference of each cluster Output: display the similarity between two cluster 1. 2. 3. 4. 5. 6. 7. 8. 9.
Initialize each reference as a cluster Assign threshold value (t) Compute Similarity(sim) between each cluster pair Find the cluster pair with maximal sim(ci ,cj) Compare sim(ci ,cj) If sim (ci ,cj) greater than threshold Merge ci and cj Go to step 2 End
The above algorithm gives the similarity between two attributes of social network Facebook and LinkedIn [28, 32]. The attribute is taken from LinkedIn and performs matching algorithm and it gives the matched Facebook user identity. For name matching, we used the character and length similarity and for company name and affiliation Jaccard similarity is used, it compares two sets. The equation 1 compute the similarity between two clusters, matched attributes are helpful for identifying the exactness of the accounts of the same individual or not [21].
Entity Resolution in Online Multiple Social Networks (@Facebook and LinkedIn) 11
Input Attribute Matching L.name=F.name L.Occup=F.Occup L.Educ=F.Educ L.Loc=F.Loc
LinkedIn Attribues
Name Occupation Education Location
Matched attribute >= Threshol
Shortlisted Candidate
Threshold
LinkedIn contents of User
LinkedIn Contents
Facebook Attributes Name Occupation Education Location
Likes Location Followin g Connecti
Content matching
Check the minimum distance between contents
Facebook of User
contents
Facebook Contents
Likes Location Follows Friends
Show the matched Facebook Account
Fig. 2 explains the different attribute of two social networking sites and it's used proposed algorithm to find the matched entity
Proposed Algorithm of Entity Resolution System: - Some of the journal papers referred proposed an identity resolution system using clustering methods where the networks of the individual in the social group were being clustered together by the
12
Ms. Ravita Mishra
collective clustering algorithm. To optimize the performance of identity resolution process we propose a system that utilizes a pairwise comparison string matching algorithm [9, 35].
Pseudo code for Proposed algorithm (Attribute Matching Algorithm) Let P1 and P2 are the two profiles(Facebook and LinkedIn), and having A1, A2,…An profile attributes of both profile Input: Profile Attributes of one online social network Output: Matched Identity on the second online social network 1. Start. 2. Assign threshold value. 3. Input the name of profile P1. 4. Input the name of profile P2. 5. Finding the Similarity between (P1.A1, P2.A1). 6. IF similarity is greater than a threshold value then predict it as a match, then assign value 1. ELSE assign value 0. 7. Repeat for all an attributes. 8. if values (total 1’s) > threshold then it shows that the two profiles are the same person. 9. Profiles are similar. 10. End.
5. Conclusion and References Conclusion: - Entity resolution in social network analysis has the main problem that is identity matching. It is a very difficult task to match entity on Facebook and LinkedIn. Earlier research papers presented a solution by simply matching profile attributes of a user in two OSNs or performing clustering algorithms individually. Here, I am trying to resolve the drawback of existing system and making an attempt to address the different problem of identity resolution in multiple OSNs. I am proposing a new approach for identity resolution technique that performs pairwise matching on identity attributes, content matching as well as self-mention search and used to access only public information and it finds the candidate identities. Proposed system used in intelligence and law enforcement agencies, because they always suffer from the missing data problem. Secondly, the identity resolution system not
Entity Resolution in Online Multiple Social Networks (@Facebook and LinkedIn) 13 only handles duplicates caused by entry errors but also handle data uncertainty. Proposed system also helpful by analysts to find user identities (e.g. spammers) across different OSNs, Even though this work has focused on two different social networks like LinkedIn and Facebook, and also trying to match few uncommon attributes of LinkedIn that will help to find the persons real identity. This work can also be extended to another social network to identify the real identity of person Facebook and Instagram also. Proposed techniques helpful in the distributed environment and proposed identity search algorithms can access public information and some important attributes that will help to find candidate identities and other identities. The improved identity matching algorithms match candidate identities with the given identity and find the unique identity of the person. The method also try to find the Facebook and LinkedIn sharing token value using API, it will improve the searching and also define the cost of each attribute. This paper gives an idea about the techniques that solve the problem of identifying user accounts across multiple online social networking sites which also increases the accuracy of finding correct identities of users. In this technique, merge the information of a single user having accounts at multiple online social networking sites. This aggregate information about a single user is useful in developing many applications such as: Security domain: Malignant users create multiple accounts on different social networking sites to enhance reachability to the targets. Recommendation domain: System helpful in building friend recommendation feature. The recommendation feature can find users friends identities on multiple social networking sites with their information on one social network and it can suggest his/her to connect to the suggested friend’s identities. Enterprises: One of the problems of e-businesses face is not being able to realize exactly, their social audience to calculate their Return on Investment. Deduplicating their social audience by linking online social networking accounts can help calculate their ROI. Human Resource management: It can be used by The HR managers to check the profile of a candidate on different social platforms.
References 1. Mustafa Bilgic, Louis Licamele, Lise Getoor, Ben Shneiderman. “D-dupe: An interactive tool for entity resolution in social networks”. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology, 2006, pages 43–50. IEEE, 2006. 2. Paridhi Jain, Ponnurangam Kumaraguru, “On the Dynamics of Username Changing Behavior on Twitter”. In Proceedings of the 3rd IKDD Conference on Data Science, 2016, CODS, pages 61–66, New York, NY, USA, 2016, ACM. 3. Paridhi Jain, Ponnurangam Kumaraguru, Anupam Joshi, “@I Seek ‘fb.me’: Identifying Users across Multiple Online Social Networks”, In Proceedings of the 22nd International Conference on World Wide Web, Companion pages 1259–1268, New York, NY, USA, 2013. ACM. 4. Paridhi Jain, Tiago Rodrigues, Gabriel Magno, Ponnurangam Kumaraguru, and Virgilio Almeida, “Cross-Pollination of Information in Online Social Media: A Case Study on
14
Ms. Ravita Mishra
Popular Social Networks”. In Proceedings of the 2011 IEEE 3rd International Conference on Social Computing, SocialCom ’11 pages 477–482, Oct 2011. 5. Anshu Malhotra, Luam Totti, Wagner Meira Jr, Ponnurangam Kumaraguru, and Virgilio Almeida, “Studying user footprints in different online social networks”, In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining, (ASONAM ’12, pages 1065–1070. IEEE Computer Society, 2012. 6. O. Benjelloun, H. Garcia-Molina, H. Gong, H. Kawai, T. E. Larson, D. Menestrina, and S. Thavisomboon, “D-swoosh: A family of algorithms for generic, distributed entity resolution” In Distributed Computing Systems, 2007. ICDCS'07, 27th International Conference on, pages 37. IEEE, 2007. 7. N. Vesdapunt, H. Garcia-Molina. “Identifying users in social networks with limited information”. 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pages 627-638. IEEE, 2015. 8. Athanasios Kokkos1, Theodoros Tzouramanis “A Hybrid Model for Linking Multiple Social Identities Across Heterogeneous Online Social Network” Springer International Publishing AG 2017 B. Steffen et al. (Eds.): SOFSEM 2017, LNCS 10139, pp. 423–435, 2017. 9. Sergey Bartunov, Anton Korshunov, “Joint Link-Attribute User Identity Resolution in Online Social Networks” The 6th SNA-KDD Workshop ’12 (SNA-KDD’12) August 12, 2012, Beijing China. Copyright 2012 ACM 978-1-4503-1544-9. 10. Weijia Xu 2, Maria Esteva, Jessica Trelogan, Todd Swinson, “A Case Study on Entity Resolution for Distant Processing of Big Humanities Data “978-1-4799-1293-3/13/$31.00 ©2013 IEEE. 11. Azadeh Esfandyari, Matteo Zignani, Sabrina Gaito, Gian Paolo Rossi, “User identification Across online social networks in practice: Pitfalls and Solutions”, October 2016 Journal of Information Science DOI 10.1177/0165551516673480. 12. Oana Goga. “Matching User Accounts Across Online Social Networks: Methods and Applications”, Computer science. LIP6 - Laboratoire d’ Informatique de Paris 6, 2014. HAL Id: tel-01103357 https://hal.archives-ouvertes.fr/tel-01103357. 13. Kai Shuy, Suhang Wangy, Jiliang Tangz, Reza Zafarani and Huan Liuy, “User Identity Linkage across Online Social Networks: A Review”, SIGKDD Explorations, Volume 18, Issue 2, page 5-17.Sep 2015. 14. Goga, O., Perito, D., Lei, H., Teixeira, R., Sommer, R, “Large-scale correlation of accounts Across social networks” Technical report, International Computer Science Institute (2013). 15. W. M. Campbell, Lin Li, C. Dagli, C. Priebe, “Cross-Domain Entity Resolution in Social Media”, Aug 2016. 16. http://www.skullsecurity.org/blog/2010/return-of-the-facebook-snatchers. 17. Cheng-Ta Chung, Chia-Jui Lin, Chih-Hung Lin, and Pu-Jen Cheng, “Person Identification between Different Online Social Networks”, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) 978-14799-4143-8/14 $31.00 © 2014 IEEE DOI 10.1109/WI-IAT.2014.21. 18. Olga Peled, Michael Fire, Lior Rokach, and Yuval Elovici, “Entity matching in online Social networks”. In Social Com, pages 339–344. IEEE, 2013. 19. Norases Vesdapunt, “entity resolution and tracking on social networks”, 2016, http://purl.stanford.edu/st867dy5990. 20. Yongjun Li, Yu Peng, Quanqing Xu, Hongzhi Yin, “Understanding the User Display Names across Social Networks”, 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC-BY 4.0 License.WWW 2017 Companion, April 3–7, 2017, Perth, Australia. ACM 978-1-4503-4914-7/17/04. 21. Morteza Saberi, Naeem Khalid Janjua, Elizabeth Chang, Omar Khadeer Hussain, Peiman, Pazhoheshfar, “In-house Crowdsourcing-Based Entity Resolution Using Argumentation”, Proceedings of the 2016 International Conference on Industrial Engineering and
Entity Resolution in Online Multiple Social Networks (@Facebook and LinkedIn) 15 Operations Management Kuala Lumpur, Malaysia, March 8-10, 2016. 22. Yilin Shen, Hongxia Jin, “Controllable Information Sharing for User Accounts Linkage Across Multiple Online Social Networks”, CIKM’14, November 3–7, 2014, Shanghai, China. 23. Hossein Rahmania, b, Bijan Ranjbar-Sahraeib, Gerhard Weiss and Karl Tuyls, “Entity resolution in disjoint graphs: An application on genealogical data”, Intelligent Data Analysis 20 (2016) 455–475 455 DOI 10.3233/IDA-160814, IOS Press. 24. Ahmed K, Panagiotis G, Vassilios S, “Duplicate Record Detection: A Survey”, IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 1, January 2007. 25. Haochen Zhang, Min-yen Kan, Yiqun Liu, and Shaoping, “Online Social Network Profile Linkage-Based on Cost-Sensitive Feature Acquisition”, SMP 2014, CCIS 489, pp. 117– 128, 2014. Springer-Verlag Berlin Heidelberg. 26. Reza Zafarani, Lei Tang, Huan Liu, “User Identification Across Social Media”, ACM Transactions on Knowledge Discovery from Data”, Vol. 10, No. 2, Article 16, Publication date: October 2015, 30 pages. DOI: http://dx.doi.org/10.1145/2747880. 27. Mudasir Ahmad Wani, Suraiya Jabin, “A sneak into the Devil’s Colony- Fake Profiles in Online social networks”, http://www.cps.gov.uk/legal/a_to_c/communications_sent_via_social_media/#a10. 28. Dewan P., Kumaraguru P. (2015, July). “Towards automatic real-time identification of malicious posts on Facebook”, In Privacy, Security and Trust (PST), 2015 13th Annual Conference on (pp. 85-92), IEEE. 29. Prieto, V. M., Álvarez, M., & Cacheda, F. (2013). Detecting LinkedIn Spammers and its Spam Nets. International Journal of Advanced Computer Science and Applications (IJACSA), 4(9). 30. Adhikari, Shalinda, Dutta, Kaushik, "Identifying Fake Profiles on LinkedIn", PACIS 2014 Proceedings. 278. http://aisel.aisnet.org/pacis 2014/278. 31. Bradbury. D, "Data mining with LinkedIn", Computer Fraud & Security 2011(10): 58.Cao, Q., et al. (2012). Aiding the detection of fake accounts in large-scale social online services. Proc. of NSDI. 32. Fire, M., et al. (2012). "Strangers intrusion detection-detecting spammers and fake profiles in social networks based on topology anomalies", Human Journal 1(1): 26-39. 33. Krombholz, K. "Fake identities in social media: A case study on the Sustainability of the Facebook business model", Journal of Service Science Research, 2012 4(2): 175-212. 34. Prateek Dewan, Ponnurangam Kumaraguru, “Facebook Inspector: Towards Automatic Real-Time Detection of Malicious Content on Facebook”, Precog Indraprastha Institute of Information Technology - Delhi (IIITD), India Social Network Analysis and Mining Volume 7, Issue 1, April 2017. 35. Cao Xiao, David Mandell Freeman, Theodore Hwa, “Detecting Clusters of Fake Accounts in Online Social Networks”, 2015 ACM. ISBN 978-1-4503-3826-4/15/10DOI: http://dx.doi.org/10.1145/2808769.2808779. 36. Jure Leskovec and Rok Sosi’ c. 2016. “SNAP: A general-purpose network analysis and graph- mining library”, ACM Trans. Intell. Syst. Technol. 8, 1, Article 1 (July 2016), 20 pages. DOI: http://dx.doi.org/10.1145/2898361.