A Recommendation System for Twitter Users in The Same Neighborhood Meshary AlMeshary
Abdolreza Abhari
Department of Computer Science Ryerson University
[email protected]
Department of Computer Science Ryerson University
[email protected]
ABSTRACT
This paper proposes a new idea that Natural Language Processing, Google Translate API’s and Fuzzy Set concept can be used to help a twitter user who has moved to a new society or culture, for example, a student moving to Toronto, Canada from his/here native country of Saudi Arabia. It discusses taking history tweets in his/her mother language and predicting interests after he has moved to his/her new location. Then the proposed system translates these interests, searches locally and then recommends Followees, Followers and Hashtags based on his/her interests. This new proposed system will help the user become aware of what is going on in his/her new location, help him/her to make connections and engage with his/her new city.
In this proposed system we are going to focus on two languages only: Arabic and English. For instance, Arabic spoken Twitter user, who tweets in Arabic and has followers and followees who Tweet in Arabic as well, has moved to another place or country where the spoken language is not Arabic but English. We are going to analyze some of the user’s tweets or retweets in Arabic and will find out the user’s interest topics and hashtags and then translate them to English. For the translation we are going to use some Natural Language Processing and Google Translate APIs to get appropriate translations. After this we will use these translated interest lists to look for local tweets, tweets from people using twitter in his/her neighborhood, that has the same interests. The user will receive recommendations from our proposed system about local Twitter users who have the same interests and likes.
General Terms
Some terms we use them in this paper:
Conceptual Fuzzy Set, Language-Game and Natural Language Processing.
- Tweet: it is an online posting message limited with 140 characters explaining, “What’s happened” and you can embed a link or pic with it.
Keywords Twitter Geolocation
- Retweet: resend or repost other user’s tweet or your tweet. - Hashtag: it is a part of the tweet usually refers to the tweet topic. It is a number symbol (#) used before a keyword of your tweet. Using hashtag is optional and you can use it more than one.
1. INTRODUCTION
- Followers: They are Twitter users who want to receive or read your tweets.
Twitter is a social website where people share their messages with a large number of users all across the world. More than 500 million Twitter users posts more than 400 million tweets per day [1], [2]. Twitter is unlike any other social network, (e.g. Facebook) the relationship between Twitter users can be social, informational or both. On the other hand the relationship between Facebook users is based on social (e.g. family, friends and more). Recently, Twitter has gained more popularity and plays a more important role in our communications. It has been playing more roles in many different places in our society, our entertainment, our business and our education [3].
- Followees: They are Twitter users that you want to receive and read their tweets.
This paper will discuss a novel idea, which will help the twitter users who has moved to a new society or culture, by recommending local tweets and hashtags based on his/her interests. No matter what language he/she speaks this new system will let him/her knows what is going on around them and might spark their interest.
- Local Tweets: They are tweets from Twitter users in your area (your physical location). Tweet Geolocation: Geolocation (place []) is a term used in Twitter API to describe where the Twitter users location when they post the tweet (the physical location). Defined as latitude and longitude coordinates, or sometimes latitude, longitude plus altitude. See figure 2. Recommendation systems [4] are information filtering systems that suggest to users new items (e.g., books, songs or video games) to buy or recommend friends to users. There are two types of recommendation systems: Personalized and NonPersonalized. The Personalized recommendation systems consider the users’ preferences to be recommended, for example, recommendation systems be used in Twitter, Facebook and Xbox.
2.1 Followee Recommendation In Twitter, users are interested in finding not only their friends but also new relevant contacts not yet known to them. A user may follow other users whom he or she does not know offline, they do not have met at all, but who share interests and topics. These new contacts can be treated as information sources for the user. Twitter provides “Who to follow” at the user profile page [1]. It recommends followees who are similar to the existing followees of the target user, and followees of those followees.
3. PROPOSED APPROACH AND DESIGN The system has four components. First component is User Interests List, second is Local Tweets Input, third is Conceptual Fuzzy Sets Subsystem and lastly, the fourth component is Output trends list. (See figure 1).
Retrieve User's Tweets and Hashtags
Extract All User's Keywords (Nouns)
Twitter API
NLP API (Stanford Arabic Parser)
User's Interests List (in Arabic)
Google Translate API
Translate the Interests List
User's Interests List (in English)
Conceptual Fuzzy Sets Subsystem
NLP API (Wordnik API)
There are many recommendation systems that have been created for Twitter. Each one has its own algorithms and filtering approaches. In this paper, we are focusing on tweet geolocation to know who are tweeting around. We are also using the conceptual fuzzy sets theory to predict the user interest by analyzing his/her tweets and then comparing them with local tweets to expand the user’s keywords list.
The proposed system, recommends appropriate hashtags to the users after it translates his/her hashtags to local language. It also looks for similar hashtags to recommend to our system user and points out the most frequently used ones.
NLP API (AlchemyAPI)
2. RELATED WORKS
There are multiple purposes for using hashtags. Some people use them to categorize their tweets, some use them as mass broadcast media for disasters or special events like elections. Hashtags are also used for brand promotion.
Local Tweet Keywords
Extract All Keywords
Conceptual Fuzzy Set (CFS) [5], [6] has been used with a language-game, where a specific word is described by other words. The meaning of a word may be defined by how the word can be used as an element of language. A word may be given different meanings, according to how it is used in a languagegame. For example, the meaning of “tall” is different when we use it with people or buildings. We can consider the person is tall if his/her height is between 180cm and 210cm (adult person) but the meaning of tall for buildings might be more than 150m. Also the meanings of words based on the parts of speech are different, for example the word “Fall” as noun or verb is totally different. It is based on context “The leaves on the tree fall (verb) in the season of fall (noun)”.
2.2 Hashtag Recommendation
Twitter API
The Second is content-based filtering approach, which finds similar items by comparing their features and characteristics. Then, the recommendation of an item is made to the user who likes or purchases similar items before. The Third is recommending items based on the preferences of a user’s friends or other social media like tag and comments. The approach we are going to use in our new system is the content-based filtering approach to recommend local tweets and hashtags.
In the proposed system, we are going to use the tweet geolocation feature to recommend local followees to users who have similar interests after he/she moved to the new place, even if the local followees are not tweeting in the users mother language. Also, the proposed system will recommend Twitter users in the new place who have a large number of followers, which can be a benefit to our system user.
Retrieve Local Tweets and Hashtags
Non-personalized recommendation systems do not use the users’ preferences, for example, recommend Top Ten list of songs or apps in iTunes based on current month sales. Personalized recommender systems utilize characteristics of items, profiles of users and the interactions or transactions between users and items to predict the users’ future item adoptions. Personalized systems have three different filtering approaches. First is collaborative filtering that breaks down into two categories: User-to-User collaborative filtering, for example, Twitter user “A” follows users “B” and “C” based on mutual interests in Jazz music, and user “D” is followed by users “B” and “C” based on the same mutual interests in Jazz music, the recommendation system in Twitter will recommend user “D” to be followed by user “A”. Item-to-Item collaborative filtering that is used by Microsoft Xbox recommender system. Items A and B are highly similar if a relatively large portion of the users who purchase item A also buy item B. Then, the preference of a user over an unrated item B is predicted based on the user’s rated item A.
Output Keywords and Hashtags
Twitter API
Search in Twitter
Recommend Local Followees and Hashtags
Figure 1. Proposed System Architecture and Design
3.1 User Interest List It is a list of the user’s interest keywords (nouns) that appear in the last 150 tweets or retweets (See figure 2). Here is the procedure that creates this interest noun list.
3.1.2 Extract Nouns We assume that our user tweets most of the time in Arabic. We use Stanford Arabic Parser software [7] or API (Arabic Natural Language Processing) to generate the user’s interest list by extracting all nouns and some verbs (keywords) that appear in his/her tweets. We tested the software manually by using this Arabic sentence “ ” ﺍاﻧﺎ ﺍاﺣﺐ ﻛﺮﺓة ﺍاﻟﻘﺪﻡمand we got this result. (See figure 4).
Figure 4. Example of using Stanford Arabic Parser
Figure 2. Example of retrieving tweet by using Twitter API in JSON format, also the example shows tweet metadata like text and geolocation
3.1.1 Detect the User’s Language To find the mother language for the user tweets we use Google Translate API or Language Detection API [10]. We tested the Language Detection API by pass the tweet (See figure 2) and we got this result (See figure 3). { "data": { "detections": [ { "confidence": 0.4217687074829932, "isReliable": 0, "language": "ar" } ] } }
Figure 3. Examples of using Language Detection API the result shows that the tweet was in Arabic.
We will consider and select all nouns types (See table 1). After extracting all nouns in Arabic, the proposed system uses Google Translate API to translate the nouns list into English. The next thing to consider the meaning of the words, for example, “Football” in North America has a different meaning than in The United Kingdom. For this concern we use Wordnik API [9] to give synonyms for multi-meaning words like “Football” (See Figures 5 and 6). Table 1. Some Tags and Description
The system extracts term frequency (TF) of each noun. To get more accurate result, we calculate the weight of TF count for each noun by using this equation (1).
“n” measure the frequency of noun i in tweets j. To calculate TFIDF we multiply TF by the inverse document frequency in Local Tweets. As the result the user’s interest list is created.
3.1.3 The Final User Interests List Figure 5. Examples of translating an Arabic sentence to English by using Google translate.
In this stage, we have the final user’s interest keywords list in local language (English), which is an input for CFS subsystem.
3.2 Local Tweets Input We use the Twitter geolocation feature to retrieve local tweets and hashtags. Then we detect the main language for these tweets, we assume them to be in English, and then we analyze these tweets by using AlchemyAPI to get more detail. Details like tweet keywords, tags, category and topic.
3.3 Conceptual Fuzzy Sets Subsystem (CFS) In this subsystem we have two inputs: Local Tweets Input and User Interests List. A local tweet input feeds the subsystem by all local tweets keywords, tags, and topics. CFS subsystem calculates the similarity degree between local tweets and user interests list. Also it expands the user interests list based on the similarity degree to be the output for this subsystem. We use Cosine similarity (2) to calculate this degree.
“a” is a user interests list. “b” is a local tweets keywords list.
3.4 Output List Output list is the result of CFS subsystem. We use the output list in Twitter search API to recommend local followees, followers and hashtags. Equation (3) is used to calculate the output trends list.
Figure 6. Example of using Wordnik to get all “Football” related words “U” is user interest list. “L” is local tweets keywords list.
“n” shows the number of Local tweet keywords list that integrate with the user interest list. “i” shows which Local tweet keywords list that integrate with the user interest list.
4. RESEARCH CHALLENGES There are many recommendation systems implemented for Twitter. However, there is no recommendation system developed yet which considers cross-languages between Twitter users. Our proposed idea tries to contribute in this issue by using Natural Language Processing, Fuzzy Set Conceptual and Google Translate API. We face a technical challenge which is the NLP API’s are written in two different programing languages.
5. CONCLUSION
6. REFERENCES [1] TWITTER. 2013. The fastest, simplest way to stay close to everything you care about. https://twitter.com/about. [2] Krishnamurthy, B. Gill, P. and Arlitt, M. 2008. A few chirps about twitter. In Proceedings of the first workshop on Online social networks, WOSN ‘08, New York, NY, USA. 19–24. [3] Hu, M. Liu, S. Wei, F. Wu, Y. Stasko, J. and Ma, K. 2012. Breaking news on twitter. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '12, New York, NY, USA. 2751-2754. [4] Lops, P. Gemmis, M. and Semeraro, G. 2011. Content-based Recommender Systems: State of the Art and Trends. In Recommender Systems Handbook, Ed. Ricci, F., Rokach, L., Shapira, B, pp.73-105. [5] Takagi, T. and Tajima, M. 2001. Query expansion using conceptual fuzzy sets for search engine. In Proceeding of the 10th IEEE International Conference on Fuzzy Systems, 2001, 1303 – 1308.
In conclusion the use of a system connected to twitter that would give recommendations of interests and likes would be of service to many who have moved locations and are surrounded by new languages. With the use of Natural Language Processing and Google Translate API’s to get the most accurate translation, in conjunction with the Fuzzy Set concept this new system will help users find connections and interests in their current community. The next step to implement this idea and then to expand it to reverse from English to Arabic and next to have it support other high use languages like French and Chinese where API’s are present.
[6] Sakaguchi, T. Akaho, Y. Takagi, T. and Shintani, T. 2010. Recommendations In Twitter Using Conceptual Fuzzy Sets. In Proceedings of the Fuzzy Information Processing Society (NAFIPS), 2010 Annual Meeting of the North American, 1214 July 2010, 1-6.
Acknowledgment
[9] WORDNIK. 2013. Wordnik has an API, and you're invited. http://developer.wordnik.com/.
The first author is grateful to Saudi Cultural Bureau in Canada for their support of this research, and for their funding of the scholarship of the first author.
[7] Green, S. and Manning, C.D. 2010. Better Arabic parsing: baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING '10, Stroudsburg, PA, USA, 394-402. [8] STANFORD UNIVERSITY. 2013. Arabic Natural Language Processing. The Stanford Natural Language Processing Group.
[10] DETECT LANAGUAGE. 2013. Language Detection API. http://detectlanguage.com/.