1 Introduction 2 Methodology

19 downloads 89550 Views 849KB Size Report
FOR BRAND MONITORING IN SOCIAL NETWORKS. Leandro A. Silva1, Orlando Bisacchi Coelho,. Bruno Okamoto, Maurilio Santos, and Rodrigo Sakami.
A  DATA  MINING-­‐BASED  METHODOLOGY     FOR  BRAND  MONITORING  IN  SOCIAL  NETWORKS      

Leandro  A.  Silva1,  Orlando  Bisacchi  Coelho,     Bruno  Okamoto,  Maurilio  Santos,    and  Rodrigo  Sakami   Mackenzie  Presbyterian  University  (UPM)    

1 Introduction   Social  networks  are  important  for  producing  and  disseminating  peoples’  opinions  on  different   subjects   (Boyd   &   Ellison,   2007).   The   positive   or   negative   opinions   there   expressed   are   used   for   for   evaluating   organizations,   products   and   services   (Pang   &   Lee,   2008).   Therefore,   many   organizations   actively   monitor   their   brand’s   reputaions,   collecting   their   customers’   opinions   and  actions  as  expressed  in  social  networks  (Aggarwal,   2011).  Organizations  need   to  react  in   the  most  timely  and  creative  manner  to  market  changes  and  trends  (Turban  et  al.,  2011).  Their   decision   taking   process   has   to   be   swift,   operating   in   near   real   time,   sometimes,   even   more   when  the  threats  and  opportunities  are  of  a  strategic  level.       The   current   work   studies   an   actual   business   case:   the   way   Coca-­‐Cola   company,   in   Brazil,   reacted   to   a   threat   to   the   perceived   quality   of   its   main   product:   the   spread   in   Twitter   of   a   report  that  a  dead  rat  was  found  inside  a  factory  sealed  Coca-­‐cola  bottle.    Using  data  mining,   this   work   monitors   the   dissemination   of   related     messages   in   the   network,   and   identifies   messages  that  have  positive  or  negative  content  (for  the  company).  By  doing  that  the  current   work   advances   a   brand   monitoring   methodology   that   can   be   used   by   any   organisation   to   track   its  standing  on  social  networks.    

2 Methodology   The   proposed   methodology   is   outlined   in   Figure   1.   Brand   relevant   posts   are   collected   either   directly  from  social  networks  or  from  aggregators,  websites  that  collect  contents  from  multiple   social   networks   (Boyd   &Ellison,   2007;   Elisson   et   al.,   2009).   The   structure   of   each   retrieved   HTML   document   is     used   for   identifying   the   relevant   part   of   the   message.   The   messages   are   then  preprocessed:  HTM  tags  are  removed  and  tokenization,  stopword  removal  and  stemming   are  performed  (Khan  et  al.,  2010).     The  number  of  posts  per  day  and  the  frequency  of  each  term  in  the  vocabulary  composed   from  all  the  extracted  words  is  then  computed.  This  generates  a  time  series  where  transition   points  between  periods  of  lower  or  higher  frequency  of  activity  can  be  identified.  These  points                                                                                                                           1  [email protected]

 

correspond  to  changes  in  the  dynamics  of  the    social  networks  re  the  relevant  discussion  that   beg  analysis  and  interpretation.  Two  techniques  support  that:     •



Word  clouds:  On  the  basis  of  the  frequency  each  term  in  the  vocabulary  appears  in  the   posts,  the  word  cloud  allows  visualising  the  most  used  terms.  From  this  the  analyst  can   decide  if  the  discussion  has  a  positive  or  negative  bias  and  also  assess  if  a  marketing   campaign  is  succeeding.     Association  rules:  This  technique  (Pang  &  Lee,  2008;  Pak  &  Paroubek,  2010;  Thelwall,   2010;   Aggarwal,   2011)   allows   detecting   which   words   appear   together   in   the   same   posts.  This  facilitates  understanding  positive  or  negative  associations  of  concepts  that   are  reflected  in  the  posts  and  also  to  focus  future  marketing  campaigns.    

 

Data  Collec`on  

Preprocessing  

• capturing  data   from  social   networks  

Analysis  of   results  

• data   integra`on   • data  cleansing   • data   structuring  

• post   monitoring   • word  cloud   visualisa`on   •   associa`on   rule  discovery    

Figure  1.  Scheme  for  the  proposed  methodology.  

3 Results     Texts   referring   to   Coca-­‐Cola   were   collected   in   the   Brazilian   subset   of   Tweeter   in   the   period   from  17/09/2013  to  29/12/2013.  Three  major  peaks  in  the  number  of  tweets  were  identified   in  the  time  series  (see  Figure  2):     • • •

the  first  period  of  increased  activity:  from  17/09/2013  to  23/09/2013;     the  second  period:  from  26/09/2013  to  28/09/2013;  and     the  third  period:  from  22/12/2013  to  29/12/2013.    

    Number  of  tweets  involving  the  Coca-­‐Cola  brand  published  per  day   st.  

1 period  

nd.

2  period  

rd.

3  period  

  Figure  2.  Post  monitoring.  

Days    

  The   first   activity   peak   occurred   after   Brazilian   press   broke   news   that   a   rat   was   found   inside   an  unopened  Coca-­‐Cola  bottle.  In  Table  1  we  present  a  word  cloud  visualization  for  the  most   frequent   words   appearing   in   this   period.   The   most   frequent   words   were   “coca”,   “cola”   and   “rato”     (“rato”   is   the   Portuguese   word   for   “rat”).   The   association   algorithm   identified   the   following  rule:  50%  of  all  the  posts  mentioned  either  “rato”  or  “coca”  (with  support  of  0.50).   Nevertheless,   the   probability   of   the   person   that   used   the   word   “rato”   also   also   using   the   word   “coca”  was  99%  (with  confidence  of  0.99).     The  second  peek  of  activity  happened  when  Coca-­‐Cola  Brazil  released  a  video  at  YouTube   showing  its  fabrication  process,  claiming  that  the  quality  of  the  process  prevented  a  rat  to  be   found  inside  a  factory-­‐sealed  bottle.  During  this  period  the  number  of  relevant  tweets  reduced   to  38%  (with  support  of  0.38),  although  the  confidence  level  remained  the  same.       The   third   peek   is   related   to   the   beginning   of   Coca-­‐Cola’s   Christmas   advertising   campaign.   It   can   be   seen,   either   from   the   cloud   network   or   the   rule   association,   that   the   number   of   tweets   including   “rato”   decreased;   the   confidence     also   decreases,   pointing   to   Brazilians’   forgetting   of   the  issue,  probably  caused  by  the  brand’s  marketing  campaigns.     Period  

Word  Cloud  Visualisation  

Rule  

1st.  

Association  Rule  Discovery   Support   Confidence  

rato  -­‐>  coca  

0.50  

0.99  

rato  -­‐>  coca  

0.38  

0.99  

rato  -­‐>  coca  

0.18  

0.89  

 

2nd.  

 

3rd.  

 

 

Table  1.  Analysis  of  the  results.  

 

4 References     Aggarwal, C. C. (2011). An introduction to social network data analytics (pp. 1-15). Springer US. Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern information retrieval (Vol. 463). New York: ACM press. Boyd, D. & Ellison, N. B. (2007). Social network sites: Definition, history, and scholarship. Journal of Computer-­‐Mediated Communication, 13(1), 210-230. Chowdhury, G. (2010). Introduction to modern information retrieval. Facet Publishing. Ellison, N. B., Lampe, C. & Steinfield, C. (2009). FEATURE Social network sites and society: current trends and future possibilities. interactions, 16(1), 6-9. Khan, A., Baharudin, B., Lee, L.. H. & Khan K. (2010) A Review of Machine Learning Algorithms for Text-Documents Classification. Journal of Advances in Information Technology, 1(1). Pak, A. & Paroubek, P. (2010, May). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proc. Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta. Pang, B. & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135. Singhal, A. (2001). Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4), 35-43. Thelwall, M., Wilkinson, D., & Uppal S. (2010). Data mining emotion in social network communication: Gender differences in MySpace. Journal of the American Society for Information Science and Technology, 61(1), 190-199. Turban, E., Sharda, R., Aronson, J. E. & King, D. (2013). Business Intelligence: A Managerial rd. Perspective. 3 ed. Prentice-Hall.