We Are What We Generate - Understanding ... - Science Direct

20 downloads 958 Views 553KB Size Report
Procedia Computer Science 95 ( 2016 ) 335 – 344 .... from social networking sites, behavioral studies of online game players, contains strong personality related.
Available online at www.sciencedirect.com

ScienceDirect Procedia Computer Science 95 (2016) 335 – 344

Complex Adaptive Systems, Publication 6 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 2016 - Los Angeles, CA

We Are What We Generate - Understanding Ourselves Through Our Data Muhammad Fahim Uddina*, Jeongkyu Leeb b

a Graduate Student,School of Computer Science and Engineering, University of Bridgeport, Bridgeport, CT, 06604, U.S.A Advisor, Associate Professor,School of Computer Science and Engineering, University of Bridgeport, Bridgeport, CT, 06604, U.S.A

Abstract We have tendency to exhibit ourselves through the data we share about ourselves including, liking, friendship, follows, disliking, pictures, audio, videos, causes, blogs and sites. Such data about us have already been used by big data companies to create customized ads and marketing tactics. However, while such data being in unstructured and noisy format, utilization and research is at its early stages. In this paper, we elaborate on the idea of understanding individuals through lens of data they produce in context of our main research work for Predicting Educational Relevance For an Efficient Classification of Talent (PERFECT) algorithm engine. We illustrate some of research problems in relevance of such data and identify research problem as ground for this paper. We present sub set of our framework including algorithm and math constructs, for the problem we identify. We conclude that such analytics and cognitive research can help to improve education, healthcare, Job economy, crime control, etc. Thus we coin the phrase “we are what we generate”, with our work in this paper. We suggest future work and opportunities in relevant directions. © by by Elsevier B.V.B.V. This is an open access article under the CC BY-NC-ND license © 2016 2016The TheAuthors. Authors.Published Published Elsevier (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of scientific committee of Missouri University of Science and Technology. Peer-review under responsibility of scientific committee of Missouri University of Science and Technology Keywords: Social networking data; data mining; personality computing and prediction; Big Five Personality Traits model

* E-mail address: [email protected]

1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of scientific committee of Missouri University of Science and Technology doi:10.1016/j.procs.2016.09.343

336

Muhammad Fahim Uddin and Jeongkyu Lee / Procedia Computer Science 95 (2016) 335 – 344

1. Introduction Big data mining and analytics have grown tremendously in the last 2 decades. It has positively affected many industries and research communities in utilizing the knowledge and valuable information extracted from unstructured or semi-structured big data. With ease of use of social networking sites such as Facebook, Twitter and LinkedIn, we produce tremendous amount of data that help understand people’s personality. We all are data points, data generators, data consumer and data collectors. With so much of raw data in hand, we need to improve the way we analyze and mine data and extract new knowledge and insights. Prediction in big data is a key feature that industry depends greatly upon. In our research, we take the personality prediction correlated with their role in academic and real world to provide useful insights for future. The personal data in industry is rapidly growing into a multibillion business†. More firms are increasingly venturing into the enterprise in a move to expand their client bases. In business perspectives, entities view the strategic use of personal data as absolute measures of aligning products to consumer demands. This step is critical to creating a business niche. Similarly, hackers strategically use malware to steal personal data from various institutions‡. Today's big data and social networking platform have revolutionized the way we exhibit ourselves and our personality attributes. The Five Factor Model (FFM)12 of personality uses personal data to classify an individual into one of the OCEAN personality traits, classifications that can subsequently be exploited by business entities. The fascination of firms with personal data has seen a surge in the number of blog sites and news posts from various organizations3. However, further understanding of such data to extract specific knowledge and understanding of the industry is still an open research area. The OCEAN model4 analyzes the personality of technology users based on the five basic foundations which include openness, conscientiousness, extraversion, agreeableness, and neuroticism. While the five factor model is used widely, it exhibits some challenges. For instance, the elements lack independence from each other. Some facets show a negative correlation as noted between neuroticism and extroversion. Also, the model fails to explain all human personalities. Some of the features that the OCEAN model fails to take into account include religiosity, sexiness, conservativeness and honesty§. Therefore, the sole reliance on data companies on the FFM traits model to understand user personalities is inaccurate. However, an integration of the FFM and personality features can provide an accurate reflection of the aspirations and objectives of an individual. Such proper understanding can be applied in guidance and counseling to enable learners to make good career choices. Besides, the integrated data model can be exploited to facilitate growth at personal and institutional levels. Whenever we use internet, we share some of our details, information or records knowing or unknowingly. This is known as Digital Footprints5. This information is available to the other organizations to know the user, what he likes and dislikes, and his other personal plans. As we use different devices to connect to the internet, our choices and other information are saved and tracked through these devices. Research shows, using the browsing history in a mobile phone, personality type of the user can be predicted up to 61% accurately6. According to the author, personality theory is the study of the differences in the way people think, behave and process information7. Therefore, the study of personality is a component of prediction model. Research shows, sociality motivation and reflectance motivation can predict situational, cultural and developmental anthropomorphism8. The theory of anthropomorphism helps to define human like behavior. The vast publicly available data on social media can also be collected and analyzed to understand personality. Research shows, the Big Five personality traits can be helpful to predict users’ personality type by analyzing the data available on Facebook and identify the correlation among them. The personality identification of users can be further utilized for marketing domain9. According to the authors, human consists of mainly nine psychological traits which are autonomy, imitation, intrinsic moral value, moral accountability, privacy, reciprocity, conventionality, creativity, and authenticity of relation10. Human behavior in social networking sites reveals about their personality type. Studies show, profile pictures in social networking sites such as Facebook, Twitter etc. can be analyzed to identify user’s personality traits by applying linear regression with Elastic Net regularization algorithm11. Researchers are studying human behavior more than 100 years from now. Clinical psychologists and psychiatrists



S. Kroft, "The Data Brokers: Selling your personal information", Cbsnews, 2014. [Online]. Available: http://www.cbsnews.com/news/the-data-brokers-selling-your-personal-information/. [Accessed: 05- Feb- 2016]. ‡ J. Pagliery, "Criminals use IRS website to steal data on 104,000 people", CNNMoney, 2015. [Online]. Available: http://money.cnn.com/2015/05/26/pf/taxes/irs-website-data-hack/. [Accessed: 05- Feb- 2016]. § "Bipolar subtypes have differing personality traits", Springer Healthcare News, vol. 1, no. 1, 2012.

Muhammad Fahim Uddin and Jeongkyu Lee / Procedia Computer Science 95 (2016) 335 – 344

are studying human behavior which depicts our personality12. The variety of individual personality combines with the environmental and developmental perspective to produce a complex configuration of behavior13. The accurate prediction of human personality has commercial benefits. Studies have shown, data analysis of the data, collected from social networking sites, behavioral studies of online game players, contains strong personality related information14. This information can be utilized to improve commercial recommender system. As we all know, psychology is the study of human mind and behavior15. The study of psychology and the application of algorithms help us to predict human behavior. Recent research has shown, Recurrent Neural Network based on Personality and Topic (RNNPT) can generate user’s status on social media by studying the personality and topic16. This information can be beneficial to understand human behavior more accurately. In our previous publication17, we extended the general model of 3 V’s to 7 V’s in one paper and context to elaborate the importance of it any and every data analytics challenges. We provide the snapshot in below table. 2. Table 1 – 7 Vs of Big Data Type Volume Velocity Variety Veracity Validity Volatility Value

Description The size it is being generated in. The speed it is being generated with Various forms of data Truthfulness of data Correctness and Accuracy of Data Retention relevance and importance Desired outcome

1.1 Who are we? Mahatma Gandhi once said, “Man is but a production of his thoughts. What he thinks, he becomes”. Though basically we are all same and belong to one super class known as Human and our needs are very much alike, such as food, survival, success, happiness, wealth, families, peace and others in the society but we tend to act differently in real world due to variety of circumstances and background we are evolved in. Thus, we finally end up having slightly different personalities even with our own family members, such as siblings. Naturally, we are very difficult to change though it is possible and as we evolve, we may change, at any age. With today’s digital world, Internet and digital foot prints that we generate on social networking, we have become data generators. With such accumulation of tremendous data has triggered new trend of researches known as Personality Computing and Prediction, particularly utilizing and mining social networking data. Such research and insights are helping understand our personality make up with computer level accuracy and reliability and it furthers help various industry to see us relevant. 1.2 Our Data and our Digital Foot prints Big Five Model1218 known as OCEAN Traits has been used for many researches to predict the personality type and categorization. Figure 1 shows the traits of this model. This model has been used for decades to categorize various individuals in one of the category based on mathematical analysis19.

Figure 1 – OCEAN Traits

337

338

Muhammad Fahim Uddin and Jeongkyu Lee / Procedia Computer Science 95 (2016) 335 – 344

1.3 Open Research Issues With so much of data accumulation in range of Petabytes, many questions in research community of psychological and computer science have risen in regard to utilization such data. Like any other mathematical problem, computers have tendency to judge human more accurately than a real being judging another being20. Human are too analytical, emotional and biased. That is why they are slow at intelligence and calculations. If I were to give you lots of data and ask you to predict, you would do it in a very long time. It is simply not cost and time efficient. I, rather use computer to do such analysis. Computers can store lots of data and can do the calculations, run the algorithms and produce results with speed of light. With this in mind, we think the following research issues are the most demanding at current age. We pick the last one and present in this paper, in the next section. a. We are producing lots of data (users) and doing mining and analytics (industry) but we have not yet matured in technology and algorithms to detect the false data that may be misleading towards future. b. Once we know (users) that the date we produce willingly may lead to our own privacy issues, and digital footprints and we (industry) don’t have methodologies to fully take care of privacy issues, yet. Plus, what if we (users) stop producing such data? We (industry) may go out of business sooner or later? c. Personality prediction and mining can manipulate our (users) personality as well. Is that acceptable? d. Are we utilizing Personality prediction data in correlation with academic and career data to understand our own talent towards academics and careers? e. Are we correlating the OCEAN Traits with the specialized personality traits to evolve into who we are? 1.4 Problem at Hand Are we doing data mining and analysis for all dimensions to identify both positive and negative talents? - This is the question, we are attempting to answer in this paper, and we provide our proposed framework, work in progress. 1.5 Proposed Solution We propose to enhance the OCEAN Traits predictive models to incorporate academic data, career data and other relevant data to predict positive and negative talent with improved accuracy. Though, 100 % accuracy of prediction is never possible, we produce efficiency and reliability up to 75 % +. 1.6 Our research work in context Our research is focused on developing algorithms and data mining model to improve student decision making process and creating good fit students and good fit candidates to improve their performance in academic and real world. Figure 2 below shows our process of main research and model. We have developed series of Stochastic Probability distribution based algorithms (Sequential) and associated math modelling and construct, titled as “Predicting Educational Relevance For an Efficient Classification of Talent” – PERFECT Algorithm Engine (PAE). This consists of 3 algorithms, as of writing of this paper, and is the following. 1. Noise Removal and Structured Data Detection (NR-and-SDD) 2. Good Fit Student (GFS) Prediction 3. Good Fit job Candidate (GFC) Prediction In above Figure 3, we show the framework for three main algorithms at present for our PAE. As we can see, we start with collecting the unstructured data set and store it in a repository and we correlate it with

Muhammad Fahim Uddin and Jeongkyu Lee / Procedia Computer Science 95 (2016) 335 – 344

339

OCEAN Big Five traits** to remove noise, and detect structure data (SDD=Structure Data Detection) and do Enhanced Personality Feature Extraction (EPF-E), as Algorithm # 1. Outputs of it (SDD, EPF) along with academic features are fed to Algorithm # 2 (GFS – Good Fit Students). Next, output GFS along with Job Seeker features are fed to Algorithm # 3 (GFC). Smiley face indicates Good Fit Candidate that our algorithm has identified/produced/predicted.

Figure 2 – PAE Framework

2. Related Study FFM (OCEAN) 2122 is widely used in research to predict and classify the various individual based on their behavior. A study in 1 discusses the measurement and Big 6 and Big 2. One of the challenges in FFM is different interpretation of its measurements that introduces some complexity. A Study2 prepared on Big 5 Inventory Report (BFI) for better assessment of personality traits. One of their focuses is on language translation for individuals from many other

**

https://en.wikipedia.org/wiki/Big_Five_personality_traits

340

Muhammad Fahim Uddin and Jeongkyu Lee / Procedia Computer Science 95 (2016) 335 – 344

nations. They include understand and verify if FFM model is standing well across various cultures and validity of individual’s profiles in such various nations. Social Media including but not limited to Facebook, Twitter, LinkedIn, flicker, Pinit, Google+ and various blogs have great tendency to capture and store the personality features that are indirectly hidden the post user generates on daily basis, in various context of industry and discussion threads. A study done in 23 performances a examination of behavior traits in predicting personality and analyze to what extent is the relevance. They utilize Facebook and Twitter data as test data. They conclude the same success in prediction personality from behavior analysis as compared to results that can be achieved using text analysis. They also take the friends and follower behavior into consideration. Micro-blogging and predicting the personality is hot research area. A study done in 4 includes 444 users for their test. They utilize real time online individuals’ behaviors to come up with algorithm using regression technique in data mining. They showed improved accuracy in relevance of related study and provide great reflection of online attitudes. They suggest continuing work in direction of other parameters like mental state and social networking behaviors. A numerous studies 242518 have shown the characteristics of data mining and personality traits such as FFM with Social networking data, to be very useful to categorize personalities accordingly. Machine learning and its prediction power2627 have emerged and evolved into many sophisticated algorithms. A study done in 28machine learning technique based on Rough Sets to extract rules for their prediction work. They claim to have better efficiencies as compared to SVM (Support Vector Machine) methodologies2930. Various comprehensive reviews3119 on Search of individual personality types for constructive real world application has been conducted. They support the future mining of data to address challenges and issues, at hand. Similar to data on social networking that contributes to Personality Prediction Opportunities and research, Educational data mining Big data Education32 contributes to prediction and classification of factors in huge data set for success and failures of students and educational system in context. A stream of research work33 34 has supported and advanced the techniques and data mining3536 algorithm to improve prediction accuracies. Various case studies3738 support to predict drop outs and improve student performance. Student behavior3940 prediction plays role in such type of data mining and conclude predictive metrics and measures. Student Recommender system41 has matured through various researches in student data mining4243. 3. Problem Illustration and Proposed Solution and Framework Layout Are we correlating the OCEAN Traits with the specialized personality traits to evolve into who we are? There has been signification progress in research and development of tools in regard to personality computing, prediction and machine learning with artificial intelligence based algorithm to process the data in relevance. This has contributed to enhance the way users in online community are observed and taken. Big data companies such as Facebook, twitter, Amazon, Google, Netflix and others, uses data to create recommendation for future. LinkedIn uses it to recommend jobs and help employers to communicate to professional looking for next better jobs. In our study of deep literature of last 10 years, very little to no work has been done to utilize personality computing in correlation with random features to test for all possibilities. In other words, we are focused to mine the predictive measures that may reveals something new and we can act accordingly. For example, It is very possible that we are allowing someone to become police officer based on interviews, background check and data analysis but we may not have looked for hidden factors that may reveal that such police officer may have tendency of becoming very aggressive and abuse the power. In the following Figure 4 we illustrate our vision and how we can apply the solution. We start with looking at various unstructured data sources including Facebook posts, twitter posts, LinkedIn, blogs, etc.. We gather the data in acceptable format for further processing. How to collect data is outside of the scope of our research. However, we use API and direct download from various vendors who provide data for research purposes. We omit the details of math model and show some constructs to support our results and work. We define our goal as show by Equation 1 below Ware = ‫ڂ‬௠ ௡ୀଵሺܱܿ݁ܽ݊௡ ‫ܦܵ ת‬௡ ሻ Where Ocean represents the categories of personality traits,

(1)

Muhammad Fahim Uddin and Jeongkyu Lee / Procedia Computer Science 95 (2016) 335 – 344

341

SD is Signal data that has noise or irrelevant component removed by the second equation as below. NSA=σஶ ௡ୀଵሺܶ‫ܦ‬௡ െ ܰ‫ܦ‬௡ ሻ

(2)

Where NSA represent Noise Seperation Algorithm, TD is total data set and ND is Noisy data.

Figure 3 – Framework to evolve into Ware and Imrpoved reliability Next we set our goal to evolve into Unique Personality Features Model (UPFM) to granualy specify the type of personalit we found by processing the data set. UPFM is related to Ware by the following Short version of Algorithm

Exhibit 1 – You are what you generate – Ware Algorithm BEGIN //We produce Signal Data and store in our repository (database) While Data Set is Valid Repeat { SD Å IMPORT Signal Data, while cleaning noise from it. }

342

Muhammad Fahim Uddin and Jeongkyu Lee / Procedia Computer Science 95 (2016) 335 – 344

End While //We import some personality factors that are predefined. UPF = Unique Personality Features UPF_test[] Å SAMPLE UPF // We save Sample Ocean Trait in Array O_test[] Å SAMPLE OCEAN //We set Threshold value to be 0.8 (80 %) and REL to be 0.7 (70%) Tr Å {0:1} REL Å {0:1} //We then correlate UPF with Ocean trait to see the likeliness, for the contents we imported in our SD. FOREACH (UPF_test[i] š O_test[i]) DO { WHILE ((Tr) is VALID) DO { //Here we use our evolution factor(Ǐ) that we developed to improve ST into ST’ Ware’ = Ǐš(ST) Ware = Ware’ }

Next i }

//we finally use our Reliability function (ě) REL Å (Ware) ě (Ocean) END In Figure 4, we show the reliability in % for no. of features we included in our test. We observe that as we increase the no. of features in OCEAN model Vs Ware, the reliability gets improved. Though we notice little non-linearity at some point as shown, and that need further research and testing.

Figure 4 – Ocean Vs Ware

Muhammad Fahim Uddin and Jeongkyu Lee / Procedia Computer Science 95 (2016) 335 – 344

343

In Figure 5, we show the evolution process of both models. We see improved prediction probabilities for Ware as compared to Ocean model.

Figure5– Ocean Vs Ware evolution process 4. Conclusions and Future works In this paper, we show the sub set of our overall research work as mentioned in introduction. We propose a framework including math model and algorithm that we use to apply on unstructured data. We also cleaned the data to have noise removed. We observe as no. of features are increase, the reliability of prediction improves. We compared our results with OCEAN model and we produce little better prediction results. This is work in progress and we aim to build upon it to improve our results with more data sets in our experiments. Future works also includes more research and data analysis to promote crime predictions, personalized healthcare and personalized education for individuals of all types. We also plan to develop a web application (http://www.perfect-algorithm.com ) into real world application for such data research and analytics. Acknowledgements We thank my advisor, Dr. Lee for his great advisory for my research studies so far. I further thank, social networking users, who constructively use such sites and create valuable data for researchers like myself to mine and predict to advance the state of the art and benefit the society and scientific community. References 1. 2. 3. 4. 5. 6.

Cieciuch, J. A. N. the Big Five and Belbin. (2014). Schmitt, D. P., Allik, J., McCrae, R. R. & Benet-Martinez, V. The Geographic Distribution of Big Five Personality Traits: Patterns and Profiles of Human Self-Description Across 56 Nations. J. Cross. Cult. Psychol. 38, 173–212 (2007). Byington, T. A. Communities of Practice: Using Blogs to Increase Collaboration. Interv. Sch. Clin. 46, 280–291 (2011). Bai, S. et al. Predicting big five personality traits of microblog users. Proc. - 2013 IEEE/WIC/ACM Int. Conf. Web Intell. WI 2013 1, 501–508 (2013). Currents, C. Digital Footprints. Language (Baltim). 89, 148–153 (2011). De Montjoye, Y. A., Quoidbach, J., Robic, F. & Pentland, A. Predicting personality using novel mobile phone-based metrics. Lect.

344

Muhammad Fahim Uddin and Jeongkyu Lee / Procedia Computer Science 95 (2016) 335 – 344

7. 8. 9. 10. 11. 12. 13. 14.

15. 16. 17. 18. 19. 20. 21. 22. 23. 24.

25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43.

Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 7812 LNCS, 48–55 (2013). Eliss, A., Abrams, M., Abrams, L. & Nussbaum, A. The Study of Personality. Personal. Theor. Crit. Perspect. 1–24 (2008). doi:http://dx.doi.org/10.4135/9781452231617.n1 Epley, N., Waytz, A., Akalis, S. & Cacioppo, J. T. When we need a human: Motivational determinants of anthropomorphism. Soc. Cogn. 26, 143–155 (2008). Golbeck, J. Predicting Personality with Social Media. Proc. 2011 Annu. Conf. Ext. Abstr. Hum. factors Comput. Syst. CHI EA 11 253– 262 (2011). doi:10.1145/1979742.1979614 Kahn, P. H., Ishiguro, H., Friedman, B. & Kanda, T. What is a human? - Toward psychological benchmarks in the field of human-robot interaction. Proc. - IEEE Int. Work. Robot Hum. Interact. Commun. 3, 364–371 (2006). Liu, L., Preot, D. & Ungar, L. Analyzing Personality through Social Media Profile Picture Choice. AAAI Digit. Libr. (2016). Martin, D. W. & D, P. The Psychology of Human Behavior 1. Psychology 1–144 (2006). Revelle, W. & Revelle, W. Experimental Approaches to the Study of Personality. Personal. Res. Methods (2007). Shen, J., Brdiczka, O., Ducheneaut, N., Yee, N. & Begole, B. Inferring personality of online gamers by fusing multiple-view predictions. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 7379 LNCS, 261–273 (2012). Stangor, C. Introduction to Psychology. Flat World Knowl. 860 (2010). at Wang, Y. We know what you are going to postௗ: User Status Generation Based on Personality and Topic. 1–8 Khan, M. A., Uddin, M. F. & Gupta, N. Seven V’s of Big Data understanding Big Data to extract value. Proc. 2014 Zo. 1 Conf. Am. Soc. Eng. Educ. 1–5 (2014). doi:10.1109/ASEEZone1.2014.6820689 Chittaranjan, G., Jan, B. & Gatica-Perez, D. Who’s who with big-five: Analyzing and classifying personality traits with smartphones. Proc. - Int. Symp. Wearable Comput. ISWC 29–36 (2011). doi:10.1109/ISWC.2011.29 Vinciarelli, A. & Mohammadi, G. A survey of personality computing. IEEE Trans. Affect. Comput. 5, 273–291 (2014). Youyou, W., Kosinski, M. & Stillwell, D. Computer-based personality judgments are more accurate than those made by humans. Proc. Natl. Acad. Sci. 112, 1036–1040 (2015). Saucier, G. & Goldberg, L. R. The Language af Personalityௗ: Lexical Perspectives on the Five-Factor Model. Five-Factor Model Personal. Theor. Perspect. 21–50 (1996). Goldberg, L. From Ace to Zombie: Some explorations in the language of personality. Adv. Personal. Assess. 1, 203–234 (1982). Adali, S. & Golbeck, J. Predicting Personality with Social Behavior. 2012 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Min. 302– 309 (2012). doi:10.1109/ASONAM.2012.58 Celiktutan, O., Sariyanidi, E. & Gunes, H. Let me tell you about your personality!†: Real-time personality prediction from nonverbal behavioural cues. 2015 11th IEEE Int. Conf. Work. Autom. Face Gesture Recognition, FG 2015 6026 (2015). doi:10.1109/FG.2015.7163171 Celiktutan, O. & Gunes, H. Automatic Prediction of Impressions in Time and across Varying Context: Personality, Attractiveness and Likeability. IEEE Trans. Affect. Comput. 3045, 1–1 (2016). Wald, R., Khoshgoftaar, T. & Sumner, C. Machine prediction of personality from Facebook profiles. Proc. 2012 IEEE 13th Int. Conf. Inf. Reuse Integr. IRI 2012 109–115 (2012). doi:10.1109/IRI.2012.6302998 Bishop, C. M. Pattern Recognition and Machine Learning. Pattern Recognit. 4, (2006). Gupta, U. & Chatterjee, N. Personality Traits Identification Using Rough Sets Based Machine Learning. 2013 Int. Symp. Comput. Bus. Intell. 182–185 (2013). doi:10.1109/ISCBI.2013.44 Chih-Wei Hsu, Chih-Chung Chang, and C.-J. L. A Practical Guide to Support Vector Classification. BJU Int. 101, 1396–400 (2008). Burges, C. C. J. C. A Tutorial on Support Vector Machines for Pattern Recognition. Data Min. Knowl. Discov. 2, 121–167 (1998). Pianesi, F. Searching for personality. IEEE Signal Process. Mag. 30, 146–158 (2013). Cen, L., Ruta, D. & Ng, J. Big Educationௗ: Opportunities for Big Data Analytics. 502–506 (2015). Shaun, R., Baker, J. De & Inventado, P. S. Chapter 4: Educational Data Mining and Learning Analytics. Springer Chapter 4, 61–75 (2014). Hien, N. T. N. & Haddawy, P. A decision support system for evaluating international student applications. Proc. - Front. Educ. Conf. FIE 1–6 (2007). doi:10.1109/FIE.2007.4417958 Jindal, R. & Borah, M. D. A Survey on Educational Data Mining and Research Trends. Int. J. Database Manag. Syst. 5, 53–73 (2013). Baker, R. S. J. D. Data mining for education. Int. Encycl. Educ. 7, 112–118 (2010). Merceron, A. & Yacef, K. Educational data mining: A case study. Artif. Intell. Educ. Support. Learn. through Intell. Soc. Inf. Technol. 467–474 (2005). doi:10.1504/IJKESDP.2009.022718 Dekker, G. W. ., Pechenizkiy, M. . & Vleeshouwers, J. M. . Predicting students drop out: A case study. EDM’09 - Educ. Data Min. 2009 2nd Int. Conf. Educ. Data Min. 41–50 (2009). doi:doi:10.1037/0893-3200.21.3.344 Sheard, J., Ceddia, J., Hurst, J. & Tuovinen, J. Inferring student learning behaviour from website interactions: A usage analysis. Educ. Inf. Technol. 8, 245–266 (2003). El-Halees, A. Mining Students Data To Analyze Learning Behaviorௗ: a Case Study Educational Systems. Work (2008). Goga, M., Kuyoro, S. & Goga, N. A Recommender for Improving the Student Academic Performance. Procedia - Soc. Behav. Sci. 180, 1481–1488 (2015). Kalpana, J. K. J. & Venkatalakshmi, K. Intellectual Performance Analysis of Students’ by using Data Mining Techniques. 3, 1922– 1929 (2014). Al-shargabi, A. A. & Nusari, A. N. Discovering vital patterns from UST students data by applying data mining techniques. 2010 2nd Int. Conf. Comput. Autom. Eng. ICCAE 2010 2, 547–551 (2010).