Abstract Identifying Security Threats On Social

0 downloads 0 Views 678KB Size Report
users, a scripting language like Python or R that has packages for ... Convert the json to csv using R and json package. 3. ... Text2vec – text mining framework.
Identifying Security Threats On Social Networks Using Pattern Recognition Justin K Joshuva, Farah Kandah, Ph.D. University of Tennessee at Chattanooga, Chattanooga, TN

Abstract

Materials and Methods

Naïve Bayes algorithm

Social networks like Facebook, Twitter, etc. have become a huge part of many people’s lives. Users communicate with each other on these networks blindly believing that they are talking to the correct person. This blind belief sometimes results in security threats due to the passing of private or confident information to the wrong user. This may lead to malicious readers getting a user’s private information and using it illegally. This work proposes mathematical model for identifying security threats using pattern recognition with the aid of NaÏve Bayes classification. By looking at the communication or message history between users, patterns can be observed by idntifying characters, words and other symbols used in the messages. These patterns will authenticate if the message is written by the same person from the communication or message history.

The materials required are the message history between users, a scripting language like Python or R that has packages for string analysis and pattern recognition.

The major part of this work is the Naïve Bayes algorithm. The classes of Naïve Bayes classification determine how the Bayes theorem will determine the probabilistic model. The algorithm needs classes to provide the base for the classification. The two classes used in this model are, if x = 1, the class designated as GOOD and the probability will meet or exceed 90%, if x = 2, the class designated as BAD and the probability will fall below 90%. Given a vertex with Mij messages and two classes to choose from C1, C2, the Naïve Bayes determines which class Cx is more likely under the assumption that messages are independent.

Introduction Since the start of civilizations around the globe, humans came up with different ways to communicate with each other. From cave paintings to cuneiforms to hieroglyphics to scrolls to letters to electronic mails. Communications that took many days, now takes about seconds to reach someone on the other side of the world. Since Social networks like Facebook, Twitter, MySpace, and LinkedIn and many other varieties of social media, the communication and the connection between humans have increased exponentially. The negative part of communication, is the security risk, which rose at the same rate as the rate of communication. Security risks such as identity theft, information theft and many other forms of security risks. Using pattern recognition to recognize and identify security threats is a novel idea because everything has a pattern if it has been observed. A person shows these pattern by the way of certain words are are typed, and by the style they use certain words, punctuation, and the use of emoticons. This work looks at the communication between people that uses social networking. By looking at the semantics, patterns and style of the language of the messages between two people, one can determine who wrote a particular message by the usage of NaÏve Bayes classification and matching the patterns in strings.

Packages used: • • • • • • • • •

rjson – to convert from json format to dataframe in R RTextTools – for classification Text2vec – text mining framework Tau – text analysis utilities Tidytext – text mining package for word processing Tm – framework for text mining SnowballC – word stemming algorithm Ggplot2 – graphics package Wordcloud – required for word cloud

Methods: Semantic analysis is the process of relating syntactic structures from the levels of phrases to the level of the writing as whole to their language – independent meanings. 1. Use the Python script to get the users’ messages Messages in json format. 2. Convert the json to csv using R and json package. 3. With the aid of various packages, prepare the file by removing punctuations, Numbers, Whitespace, Common elements. 4. Identify the Patterns. 5. Apply Naïve Bayes algorithm - Create the training and testing set 6. Analyze the results.

𝑎𝑟𝑔𝑚𝑎𝑥& [𝑝 𝐶& 𝑀+,- , 𝑀+,/ , … 𝑀+,1 = 𝑝 𝐶& ∩ 𝑀+,- ∩ 𝑀+,/ ∩ ⋯ ∩ 𝑀+,1 𝑎𝑟𝑔𝑚𝑎𝑥& 𝑝 𝑀+,- , 𝑀+,/ , … 𝑀+,1 Because the denominator of the above equation is positive for all possible classes for any user and it will become irrelevant when all probabilities are compared. Thus reducing the original problem to 𝑝(𝑀7,+ |𝐶) for all 0 ≤ 𝑎 ≤ 𝑁. The Naïve Bayes is checked on the friendships and the weight of the friendships to get the correct probability.

Results There are various ways to observe the result. Using various packages, one can observe the most frequent words in the shape of wordcloud or bar graphs. From the messages, the emotional style can be seen in different graphics.

Conclusion

Naïve Bayes is a technique for constructing classifiers. Naïve Bayes is not a single algorithm for training classifiers but a family of algorithms based on common principle. In order to achieve the probability required to classify classes to check if the messages can match the message history and the probability of that matching.

Limitations: There are many limitation to this particular project. • The retrieval of the message history is not easy because there is no easy access to Facebook’s message service. • Requires the knowledge of R/Python and the Naïve Bayes classification and other statistical analysis. • The lack of research in this “exact” field. There are many researches on semantics in Twitter, but when the characters are limited to certain number, the style of writing varies greatly when compared to the normal writing style of the said individual. • Building the training set and test for text.

References 1. W Sherichan,S Nepal, C Paris, “A Survey of Trust in Social Networks,” ACM Computing Survey 13. 2. R Heatherly, M. Kantarcioglu, B. Thurasingham,J. Lindamood, “Preventing private Information attack on social networks.,” IEEE Transcations on Knowlege’09. 3. S Rupma, “A Survey on Social Network Analysis and its Future Trends.,” IJARCCE’13. 4. Y Oganian, M Conrad, A Aryani, K Spalek, H Heekeren, “Activation Patterns throughout the Word Processing Network of L1- dominant Bilinguals Reflect Language Similarity and Language Decisions.,” Journal of Cognitive Neuroscience’15. 5. W Y Chong, B Selvaretnam,L-K Soon, “Natural Language Processing for Sentiment Analysis: An exploratory analysis on Tweets.,” Artifical Intelligence with Application in Engineering and Technology ’14. 6. “The Growth of Social Media: From Passing Trend to International Obsession” 7. [Infographic].http://www.adweek.com/socialtimes/the-growthof-socialmedia-from-trend-to-obsession-infographic/142323”Adweek.com.