Detecting disease trends using Twitter data

9 downloads 1040 Views 1MB Size Report
might contain health-related information. • can be explored for public health monitoring and surveillance purposes (Paul et al. 2016). Indonesia Social Media ...
Validating search protocols for mining of health and disease events on Twitter Aditya Lia Ramadona1,2*, Lutfan Lazuardi3, Sulistyawati1,4, Anwar Dwi Cahyono5, Åsa Holmner6, Hari Kusnanto3, Joacim Rocklöv1 The International Conference on Public Health (ICPH) Solo, Indonesia; September 14-15, 2016 https://arxiv.org/abs/1608.05910

Introduction

Twitter • free social networking and microblogging service • 140-character: news, events, personal feeling and experiences, … • May 2016: 24.34 million Indonesian active users ~ 10% (Statista, 2016)

Twitter offers streams of the public data flowing • might contain health-related information • can be explored for public health monitoring and surveillance purposes (Paul et al. 2016) Indonesia Social Media Trend (Jakpat, 2016)

Introduction Previous studies • • • •

Signorini et al. 2011: track levels of disease activity Eichstaedt et al. 2015: predicts heart disease mortality Strom et al. 2013: measuring health-related quality of life many more…

Methodological challenges • data and language processing • model development

www.bahasakita.com

Subjects and Methods Develop groups of words and phrases relevant to disease symptoms and health outcomes in the Bahasa Indonesia historical Twitter

14d

real-time

Twitter stream

Subjects and Methods Sentiment analysis • examining a tweet from Twitter feeds • the decisions were made by people with expert knowledge millions of tweets: time-consuming and inefficient

Replicating expert assessment • develop a model, interpret results and adjust the model • make predictions

Results: text analysis Historical Twitter feeds: 390 tweets • "rumah OR sakit OR rawat OR inap OR demam OR panas -cuaca OR berdarah OR pendarahan OR tombosit OR badan OR muntah OR badan OR tua OR ':('"

Preprocessing • removing retweets and eliminate some noise • removing punctuation, numbers, capitalization, and the Bahasa stop-words (e.g. kamu and aja)

[107] "@XYZ kamu izin aja, bilang kamu sakit :((" [107] "xyz izin bilang sakit"

Results: text analysis 1,632 words • the most highly correlate words: sakit (sick, ill, pain) hati (0.23) ~ shame, broken heart, … rasa (0.13) ~ pain perut (0.12) ~ stomach ache

Figure 1. Words that appear more than 10 times

Results: model development Predictors • highest words frequencies (22) • counting the number of the predictor words in a tweet

Classification and Regression Trees model (Breiman et al. 1983) • rpart package (Therneau et al. 2015)

Results: model development 390 tweets historical Twitter feeds • 273 tweets (70%): training • 117 tweets (30%): validating 1,145,649 tweets Twitter stream feeds: testing Indonesia: between 11°S and 6°N and 95°E and 141°E, 7 days: 26th July – 1st August 2016

• 100 from 6,109 TRUE results • 100 from 1,139,540 FALSE result

Results: model development

Results

Results Model Performance AUC Sensitivity Specifity Positive Predictive Value Negative Predictive Value

Validation 0.82 80.0 84.6 86.7 77.2

Testing 0.70 42.0 98.0 95.5 62.8

Limitations + Challenges = Future Work team member involved • academics, health workers

Twitter users • telecommunications infrastructure • characteristics of people

methods • data: streaming (Indonesia, 7d/24h ~ 1.5GB in csv format) • model: CART, RandomForest, GBM, …

Summary Monitoring of public sentiment on Twitter + contextual knowledge • a nearly real-time proxy for health-related indicators

Models do not replace expert judgment • accurately analyze small amounts of information (tweets) • improve and refine the model • bias and emotion: integrate assessments of many experts

Summary

1

Department of Public Health and Clinical Medicine, Epidemiology and Global Health, Umeå University 2 Center for Environmental Studies, Universitas Gadjah Mada 3 Department of Public Health, Faculty of Medicine, Universitas Gadjah Mada 4 Department of Public Health, Universitas Ahmad Dahlan 5 District Health Office, Yogyakarta 6 Department of Radiation Sciences, Umeå University *[email protected]

www.themexpert.com/images/easyblog_articles/270/twitter_cover.jpg

Suggest Documents