Gender Prediction on Twitter Using Stream Algorithms with N-Gram ...

Recommend Documents

Author Gender Prediction in an Email Stream Using Neural Networks

William Deitrick, Zachary Miller, Benjamin Valyou, Brian Dickinson, Timothy Munson, Wei Hu*. Department of Computer Science, Houghton College, Houghton, ...

Language-independent Gender Prediction on Twitter - Association for ...

Aug 3, 2017 - the bag-of-words model when applied to different languages, showing .... a location defined, and red, green and blue color component intensity ...

Discriminating Gender on Twitter - Semantic Scholar

Discriminating Gender on Twitter. John D. Burger and John Henderson and George Kim and Guido Zarrella. The MITRE Corporation. 202 Burlington Road.

Prediction of Topic Volume on Twitter - Kno.e.sis

H.2.8 Database Management: Database Applications - Data. Mining. 1http://www.twitter.com .... ing topic-relevant tweets, which is in turn a good predictor for the user's future ... the regressand. Compared with other tools, linear regression.

Discriminating Gender on Twitter - Semantic Scholar

sparked a great deal of research interest in aspects of social media, including ... Twitter is a social networking and micro-blogging plat- form whose users publish short ..... name, and it too is informative (77.1%), as is the user's self-assigned

Language Independent Gender Classification on Twitter

{jalowibd,buy,psyu}@cs.uic.edu ... color-based features extracted from Twitter profiles (e.g., .... best prediction accuracy among the options that we consid- ered.

Prediction of Topic Volume on Twitter - Kno.e.sis

interested in predicting Twitter users' behavior of generating topic-relevant tweets ... sign strategy which maximizes the reach of viral messages) and many more ...

Crystal structure prediction using evolutionary algorithms

gauche, and we also found a number of low-energy metastable chain (previously obtained in. Ref. 57) and layered structures, as well as another 3D-polymeric ...

Protein Structure Prediction Using Evolutionary Algorithms ... - LCC

The black cube represents the current location. (Right) Relative moves in a cubic lattice. The black cubes represent the current location and the previous one.

Prediction of Pathological Subjects Using Genetic Algorithms

Jan 2, 2018 - GA is successfully applied to medical prediction problems and has ...... Journal of the American Podiatric Medical Association, vol. 88, no. 4, pp.

Gender Prediction Using Browsing History

The effectiveness of many web applications such as online marketing, .... tion then we use an automated classifier to classify the page into one of standard.

Genomic-enabled prediction with classification algorithms - Nature

Jan 15, 2014 - penalisations according to the misclassification group (Fernández et al., 2011) ...... We also thank HPC-Cluster CCT-Rosario for computing time.

Personalized Filtering of the Twitter Stream

work in the recent times, well known ones include Facebook, MySpace, ... Facebook, Linkedin). ..... 15 https://developers.facebook.com/docs/reference/api/ ...

WARNINGBIRD: Detecting Suspicious URLs in Twitter Stream

BIRD, a suspicious URL detection system for Twitter. In- stead of focusing on the ..... tio, because they can use Sybil followers or buy followers. Therefore, instead of .... cause it shows the best accuracy values with our dataset. (see Table 4).

Semi-Supervised Spam Detection in Twitter Stream

Feb 2, 2017 - required by the detection module are updated in batch mode based on the tweets ... which is two orders of magnitude higher than that of email.

Geolocation Prediction in Twitter Using Location ... - Google Sites

location-based recommendation (Ye et al., 2010), crisis detection and management (Sakaki et al., ... Section 2 describes

Enhancing Link Prediction in Twitter using Semantic User Attributes

5 Dr. Ahmed Zewail St., Postal:12613,Giza,Egypt. +202 3567 9064 ... dataset of 2.974k users on the Twitter social network. The proposed model considers ...

Stream Prediction Using A Generative Model Based ... - Ryen W. White

Stream Prediction Using A Generative Model Based On. Frequent Episodes In Event Sequences. Srivatsan Laxman. Microsoft Research. Sadashivnagar.

Performance Prediction of a Free Stream Tidal Turbine with ...

2nd International Conference on Ocean Energy (ICOE 2008), 15th â 17th October 2008, Brest, France. 1 ... Free stream tidal turbines are a source of growing interest in the marine ..... MATLAB code calls the main BEM program at various.

USING TWITTER 1 Using Twitter to Informally Assess ...

This study examined the use of Twitter in a graduate-level course to informally assess student progress ... been said that President Obama skillfully used social media, particularly Twitter, to engage .... Political figures included President Barack.

Talking Politics on Twitter: Gender, Elections, and Social Networks

664218SMSXXX10.1177/2056305116664218Social Media + ... As campaign discussions increasingly circulate within social media, it is important to understand ...

Prediction Model for Influenza Epidemic Based on Twitter Data - ijarcce

International Journal of Advanced Research in Computer and Communication Engineering. Vol. 3, Issue ... Keywords: Twitter APIs, Markov Chain State Model, BOWs, Time series classification. ..... tweets retrieved by the search operator (like).

Bayesian Twitter-based Prediction on 2016 US Presidential ... - arXiv

Nov 2, 2016 - model for 2016 U.S. Presidential Election based on Twitter data. We use ... and Democratic candidates use Twitter as their campaign media. Previous ..... [3] Jahanbakhsh, Kazem; Moon, Yumi, The Predictive Power of Social.

A Survey on Data Stream Clustering Algorithms - Google Sites

The storage, querying, processing and mining of such data sets are highly .... problems, a novel approach to manipulate

Gender Prediction on Twitter Using Stream Algorithms with N-Gram ...

Download PDF

2 downloads 0 Views 112KB Size Report

Comment

Zachary Miller, Brian Dickinson, Wei Hu. Department of Computer Science, Houghton College, Houghton, USA. Email: [email protected]. Received July 26 ...

International Journal of Intelligence Science, 2012, 2, 143-148 http://dx.doi.org/10.4236/ijis.2012.224019 Published Online October 2012 (http://www.SciRP.org/journal/ijis)

Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features Zachary Miller, Brian Dickinson, Wei Hu Department of Computer Science, Houghton College, Houghton, USA Email: [email protected] Received July 26, 2012; revised August 30, 2012; accepted September 10, 2012

ABSTRACT The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts to learn about the author of the text through subtle variations in the writing styles that occur between gender, age and social groups. Such information has a variety of applications including advertising and law enforcement. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to identify the gender of users on Twitter using Perceptron and Naïve Bayes with selected 1 through 5-gram features from tweet text. Stream applications of these algorithms were employed for gender prediction to handle the speed and volume of tweet traffic. Because informal text, such as tweets, cannot be easily evaluated using traditional dictionary methods, n-gram features were implemented in this study to represent streaming tweets. The large number of 1 through 5-grams requires that only a subset of them be used in gender classification, for this reason informative n-gram features were chosen using multiple selection algorithms. In the best case the Naïve Bayes and Perceptron algorithms produced accuracy, balanced accuracy, and F-measure above 99%. Keywords: Twitter; Gender Identification; Stream Mining; N-Gram; Feature Selection; Text Mining

1. Introduction Social networking is one of the fastest growing industries on the web today [1]. Structured to accommodate personal communication across large networks of friends, social networks produce an enormous amount of usergenerated data. The open availability of this data on public networks, particularly Twitter, provides a good opportunity to research the unique characteristics of informal language. As such, Twitter has become the subject of many studies seeking to obtain useful information from user-generated tweets. Some of the topics for this research include the determination of gender, age, and geographical location of Twitter users. Any information that can be gleaned from authorship may have applications across a variety of fields; for instance, gender and age identification have applications in marketing, advertising, and legal investigation [2]. Collecting such information from Twitter does, however, have unique challenges. Unlike traditional authorship analysis problems which are based on samples hundreds of words in length [3], the analysis of Twitter is hindered by the 140 character limit on tweets. Other difficulties include both accidental and purposeful misspellings, and internet slang. However, certain distinctive traits, which have emerged as a result Copyright © 2012 SciRes.

of the limitations of Twitter and informality of social networks, provide the possibility for accurate analysis. Of particular interest among these characteristics is the proliferation of informal acronyms, emoticons, and purposeful misspellings. Acronyms such as, “lol”, “rofl”, and “omg” and emoticons like “=P”, “