Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 59 (2015) 254 – 261
International Conference on Computer Science and Computational Intelligence (ICCSCI 2015)
Data mining of automatically promotion tweet for products and services using Naïve Bayes algorithm to increase twitter engagement followers atPT. Bobobobo James Luke1, Suharjito2* 2
1 PT. Bobobobo, Gereja Theresia No.16, 10350 Menteng, Jakarta, Indonesia Magister of Information Technology, Bina Nusantara University Jakarta,Indonesia
Abstract The number of user social media Twitter in Indonesia giving benefits for companies to promote products and services via Twitter. In Twitter, to maximize the promotion it’s needed more followers. Through this research data mining used to discovered follower’s trend information of products or services to increase the engagement followers through automatically tweet promotion. Algorithm Naïve Bayes used to classified tweet followers based on products or services, then analyzed to get trend words and afterwards adapted to promotion tweet. The results of this research, Naïve Bayes Classification accuracy reached 90.31% used data testing from tweet product and 80.91% from data testing tweet services, and increased engagement followers reached 69%. © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license © 2015 The Authors. Published by Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-reviewunder under responsibility of organizing committee the International Conference on Computer Science and Peer-review responsibility of organizing committee of theof International Conference on Computer Science and Computational Computational Intelligence Intelligence (ICCSCI 2015) (ICCSCI 2015). Keywords:Data mining, tweet promotion, engagement followers, Naïve Bayes, witter
1. Introduction Internets have significant growth in Indonesia last 10 years. From journal 1, said the number of Internet users in
* Corresponding author. Tel.: 08128400536. E-mail address:
[email protected]
1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of the International Conference on Computer Science and Computational Intelligence (ICCSCI 2015) doi:10.1016/j.procs.2015.07.550
James Luke and Suharjito / Procedia Computer Science 59 (2015) 254 – 261
255
2012 reached 63 million people or 24.23 percent of total population in this country and predicted will be rise about 30 percent next year. The fast growths of internet, make internet have great potential for media promotion in business. The research [2], said sales promotion has a significant impact on organizational performance. Nowadays social media was one of the promotion media. The survey1, said most Internet users accesed the internet for social media (87.8%), According to3, said that Indonesia was ranked 4 Facebook users and 5 for Twitter. Several companies such as Dell, LEGO, Sony has done promotion through social media Twitter and succeeded. A key to succeed from promotion was engagement followers from tweet promotions 4, 5, 6. Algorithm Naive Bayes Classifier (NBC), used to determined appropriate promotional tweet which will shared to followers based on the classification results. NBC has a better accuracy rate than other classification models 7. 2. Method Data Mining referred to the overall process consists of collecting and analyzing data, developing models of inductive learning and adoption of the of decision-making practice as the consequences of actions based on the knowledge gained8. Meanwhile, according to [9] explains that data mining was an election or seek knowledge from vast amounts of data. Naïve Bayes Classification (NBC) was one of the algorithms in data mining techniques which applying Bayes theory in the classification. Naive Bayes that decision from each class, calculate probability on condition that the class was considered as correct decision, with a collection of information objects. This algorithm assumed that attributes of an object are independent or free10. ሺܥȁܦሻ ൌ ൌ
ሺܥሻ ሺܦȁܥሻ ሺܦሻ
ሺܥሻ ෑ ሺݓ ȁܥሻ ሺͳሻ ሺܦሻ
The process of conducted research, as follows:
Fig.1 Research Framework
256
James Luke and Suharjito / Procedia Computer Science 59 (2015) 254 – 261
2.1 Data The data used in this paper comes from the Indonesian language tweets in Jakarta, from June 2013 until February 2014. From data obtained, divided into two categories: products and services. Several tweets as example: Products x “Liattas,sepatu, kerudung, bajubawaannya pengenbelisemua” x “Nyari dress batik: kalodapet yang disuka, antaramahalbingit di kantongataudressnya...sepasangsamakemeja...” x “mehtukutas Carrier sing 35 L sing murahwae, :D” Services x “kelas yoga after hours lumayanbikinbadanlentur yah cyin...” x “liburankeparis, traveling mahameru, study banding keausi” x “Cari hotel murahdidekatpusat Kota Surabaya ?ada saran ?” 2.2Classification tweet with NBC Tweet data that have been collected, used as training data set and grouped as class products and services. The next process was tweet classified with NBC algorithm to measure the accuracy of outcome.
Fig.2 Flowchart Classification Tweet using NBC
2.3 Selection trend words and tweet Set of tweets that have been classified into class products and services. The next step elected trend words and compared with a tweet that will be promoted automatically.
James Luke and Suharjito / Procedia Computer Science 59 (2015) 254 – 261
Fig.3 Flowchart Selecting and Automatically Tweet Promotion
The steps of elected trend words and tweets which will be promoting, as follows: 1) Tokenization The word separated from tweet as token. The word was considered as valid consist of 3-25 letters and not a link or url. 2) Stop word The process removed words that considered not have meaning, based on list defined. The results of this process will be stored in a database table. 3) Trend Words Using query from database table to retrieve the 5 words from categories of products and services. 4) Promotion Tweets Tweet promotion created by admin social media. Each tweet specified time range to be promoted, keywords, and a minimum score of a match with trend word. Tweet stored into a database table. 5) Tweet Automatically Process queries to the database to found a match tweet promotion with trend words. If the criteria match, then the promotion tweet will automatically tweet. 2.4Evaluation system This research used 2 stages of evaluation system, as follows: 1) Analysis accuracy from Naïve Bayes classifier In measured the performance of Naïve Bayes classifier, tweet/data formed into three variation training data and test, in each tweet tested products and services. Distribution of training data and the test is shown by the
257
258
James Luke and Suharjito / Procedia Computer Science 59 (2015) 254 – 261
table below, assumed P=Tweet Products and S=Tweet Services. Table 1. Composition and Variation Data Trains
Product = Service
Data Train (tweet) Product > Service
Product < Service
2000 P/S
500 P & 500 S
700 P & 300 S
300 P & 700 S
1500 P/S
1500 P & 1500 S
2000 P & 1000 S
1000 P & 2000 S
1000 P/S
2500 P & 2500 S
3500 P & 1500 S
1500 P & 3500 S
500 P/S
3500 P & 3500 S
4900 P & 2100 S
2100 P & 4900 S
50 P/S
4500 P & 4500 S
6300 P & 2700 S
2700 P & 6300 S
Data Test (tweet)
2) Analysis increasing engagement follower In dignified the performance of engagement followers, then tweet engagement rate measured by following calculation [11]: ୰ୣ୮୪୧ୣୱା୰ୣ୲୵ୣୣ୲ୱ
ൌ
୨୳୫୪ୟ୦୲୵ୣୣ୲
ൈ ͳͲͲሺʹሻ
3. Results and discussion 3.1 Results of Tweet Classification using NBC The used of composition variation of training data through tweet product category, obtained results as follows: Table 2. Results NBC using Data Tweet Products
Data Train Data Test (tweet) 2000 1500 1000 500 50 Average
Product = Service 77.65% 87.20% 91.90% 96.80% 98.00% 90.31%
Product > Service 18.85% 20.13% 11.20% 12.80% 3.20% 13.24%
Product < Service 96.95% 98.20% 98.90% 99.20% 99.60% 98.57%
James Luke and Suharjito / Procedia Computer Science 59 (2015) 254 – 261
259
The used of composition variation of training data to tweet service category, obtained results are as follows: Table 3. Results NBC using Data Tweet Services
Data Train Data Test (tweet) 2000 1500 1000 500 50 Average
Product = Jasa 70.65% 73.50% 79.60% 85.80% 95.00% 80.91%
Product > Jasa 97.05% 98.86% 99.00% 99.10% 99.60% 98.72%
Product < Jasa 6.45% 10.73% 4.40% 3.00% 4.30% 5.77%
The composition variation training data against a combination of products and services tweets category, the obtained results are as follows: Table 4. Results NBC Using Combination Data Tweet Products and Services
Data Test Data Train (tweet) (tweet) 2000 P & 2000 S 500 P & 500 S 1500 P & 1500 S 1500 P & 1500 S 1000 P & 1000 S 2500 P & 2500 S 500 P & 500 S 3500 P & 3500 S 50 P & 50 S 4500 P & 4500 S Average
Accuration 74.50% 78.63% 83.55% 88.80% 92.10%
83.51%
From the experimental above, writer concluded the method NBC gives a fairly good accuracy in classification of tweets that have been done. Decreased levels of accuracy occur as the increases number of test data following with reduced number of training data. The proportions of training data number greatly affect the accuracy classification of the method NBC. 3.2 Results automatically tweet promotion From the experiment conducted in September 2014 to February 2015, obtained the following results:
260
James Luke and Suharjito / Procedia Computer Science 59 (2015) 254 – 261
Fig.4 Engageement Twitter @_bobobobo_
From chart above, the average calculation done byy comparing results of re-tweets, mentions and followers addeed before and after. Specifically shown as table below Table 5, Improovement Engagement Follower
Retweet Mention Follower
Beforre
After
Results
59.33 26.67 56.33
83 60.5 92.16
39% 120% 63%
Table above shown that the increase of re-tweeets and mentions will greatly affect growth of engagem ment followers. The engagement followers resulted from the increased tweet engagement rate after implementation. T The results of the comparison before and after, is shown on the graph below:
Figure 5 Tweet Engagemeent Rate Before and After Implementation
This researched conclude that the increased tweet engagement rate give positive effect to increased engagementt follower.
James Luke and Suharjito / Procedia Computer Science 59 (2015) 254 – 261
4. Conclusion The research concluded: 1. 2. 3. 4. 5.
NBC algorithm provides accuracy up to 90.31% used products data test tweet, while 80.91% accuracy obtained by used service tweet test data. If used combination data of products and services tests, NBC delivered results with accuracy of 83.51%. More stop word lists will be great for determine the trend of the word from tweet collection of products and services. Increased activity occurs in twitter by this research, for re-tweets reached by 39%, mention 120% and new followers 69%. Re-tweets and mentions giving impact to result of engagement followers. The number of tweet engagement rate, after this research giving highest results 17.44% and lowest 4.72%. If it compared with the previous study, highest 3.80% and lowest 2.90%. By this kind of promotion, gave better result. Before tweet sent, we can analyzed the trend words from followers, which will give a great response from follower
References 1.
APJII. ProfilPengguna Internet Indonesia 2012. Retrieved February 4, 2014, fromIndonesian Internet Service Provider Association: http://www.apjii.or.id/v2/upload/Laporan/Profil%20Internet%20Indonesia%202012%20(INDONESIA).pdf. 2. Manuere, F., Gwangwava, E., &Gutu, K.. SALES Promotion As a Critical Component Of a Small Business Marketing Strategy. Institute of Interdisciplinary Business Research.2012;4, (6), P:1157-1169. 3. Kominfo. Pengguna Internet di Indonesia 63 Juta Orang. Retrieved February 6, 2014, from kominfo.go.id: http://kominfo.go.id/index.php/content/detail/3415/Kominfo+%3A+Pengguna+Internet+di+Indonesia+63+Juta+Orang/0/berita_satker. 4. Chandra, M.ManfaatDahsyatSosial Media bagi e-Commerce. Retrieved February 6, 2014, FromdetikInet: http://inet.detik.com/read/2013/06/24/092646/2281884/398/1/3-manfaat-dahsyat-sosial-media-bagi-e-commerce 5. Forrester Counsulting.Engaged Social Followers Are Your Best Customers : How Marketers Can Leverage Social Tools Throughout The Customer Life Tools Throughout The Customer Life. Wildfire;2013. 6. Funk, T. Social media playbook for business: reaching your online community with Twitter, Facebook, LinkedIn, and more. Santa Barbara: Praeger.;2011 7. Xhemali, D., Hinde, C. J., & Stone, R. G. Naïve Bayes vs. Decision Trees vs. Neural Networks. IJCSI International Journal of Computer Science Issues, 2009: 4(1). 8. Vercellis, C. Business Intelligence: Data Mining and Optimization For Decision Making. Chennai: John Wiley & Sons; 2009 9. Han, J., Kamber, M., & Pei, J. Data mining : concepts and techniques. USA: Morgan Kaufmann Publishers; 2011 10. L.Olson, D., &Dursun, D. Advanced Data Mining Techniques. Berlin: Springer;2008 11. Socialbakers. [Infographic] Social Media Marketing Is Not Just About 1 Metric! Retrieved February 11, 2014, from Socialbakers: http://www.socialbakers.com/blog/688-infographic-social-media-marketing-is-not-just-about-1-metric
261