Cardiovascular risk assessment in athletes

3 downloads 0 Views 813KB Size Report
Cardiovascular diseases. In 2012, CVDs were the leading cause of NCDs' deaths (17.5 M). Davide Barbieri. IUAES 2016 Dubrovnik. 2 ...
Cardiovascular risk assessment in athletes by means of statistical learning D. Barbieri & L. Zaccagni University of Ferrara, Italy

Cardiovascular diseases

In 2012, CVDs were the leading cause of NCDs’ deaths (17.5 M) Davide Barbieri

IUAES 2016 Dubrovnik

2

Prevalence Deaths per 100k people: top 5 and bottom 5 Country (year 2012) Turkmenistan Kazakhstan Mongolia Uzbekistan Kyrgyzstan Republic of Korea Canada Israel France Japan

Both sexes Female Male 712 618 820 635 515 808 586 483 723 577 509 656 549 462 660 92 88 86 85 82

76 68 70 65 58

Stat Mean SD Max Min

both 271 124 712 82

F 242 115 618 58

M 306 149 820 105

112 112 105 111 108

Source: WHO web site http://apps.who.int/gho/data/node.main.A865CARDIOVASCULAR?lang=en accessed 15 Apr 2016 Davide Barbieri

IUAES 2016 Dubrovnik

3

Causes and consequences • Smoking, physical inactivity, unhealthy diet and alcohol • Medical care of CVDs is expensive: conflict of interest b/w physicians (defensive medicine) and public administration (spending reviews) • In developed countries, lower socioeconomic groups have greater prevalence of risk factors and higher mortality. • In developing countries, as CVDs’ prevalence increases the burden will shift to the lower socioeconomic groups

Source: WHO http://www.who.int/cardiovascular_diseases/prevention_control/en/ , accessed 15 Apr 2016

Davide Barbieri

IUAES 2016 Dubrovnik

4

Purposes • To predict the risk of CVDs in an active population, minimizing false alarms and false negatives • To optimize public spending in medical care (sustainable health care) • CVDs are quite rare among athletes who train consistently • Still, these subjects may be at risk because of repeated and intense efforts • Routinely monitored during sport medical examinations by means of ECG • In case of a positive ECG, they are warned against intense sport practice

Davide Barbieri

IUAES 2016 Dubrovnik

5

Sample • 33,126 Croatian athletes, both sexes, 4-69 years old • Data collected at the Sport Policlinic in Zagreb: • • • • • •

Sex Age Weight Height Pulse rate Blood pressure (systolic and diastolic)

• ECG outcome: P (≈9%) or N (≈91%)

Davide Barbieri

IUAES 2016 Dubrovnik

6

Methods • • • •

Binary classification by means of statistical learning Logistic regression (STATA) & data mining (WEKA) Cross-validation Height and weight (correlated) replaced by BMI=weight/height2

Davide Barbieri

IUAES 2016 Dubrovnik

7

Classification issues • • • •

Accuracy (correct guesses / total) not appropriate ROC and Youden index J=TPR+TNR-1 FN (i.e. unpredicted death risk) has higher cost than FP (extra ECG) Sensitivity may be more important than specificity: weighted J?

At risk Not at risk

Davide Barbieri

P TP FN

IUAES 2016 Dubrovnik

N FP TN

8

Logistic regression

Davide Barbieri

IUAES 2016 Dubrovnik

9

First results: ROC • • • •

Highly significant and very good fit Still, not predictive: AUC=0.55 Collected data not meaningful? Logistic regression not suitable?

Davide Barbieri

IUAES 2016 Dubrovnik

10

Exploratory data analysis • J. Tukey (1977) • If causes are tobacco, excessive body weight and lack of physical activity, collected variables should be informative (BMI, blood pressure, pulse rate) • Some biomedical variables, like pulse rate, may have abnormal low or high values, unlike blood pressure, for example, which increases risk only as it raises • Thresholds are available from medical literature • Still, we adopted a data-driven approach in order to find thresholds inductively in our sample

Davide Barbieri

IUAES 2016 Dubrovnik

11

Risk as a function of pulse rate and blood pressure Risk as a function of pulse rate

Risk as a function of blood pressure

50

12

40

10

8

30

6 20

4

10

2

0

0 low

Davide Barbieri

normal

high

low

IUAES 2016 Dubrovnik

normal

high

12

Data mining • Oversampling using SMOTE (Chawla et al. 2002) and undersampling were applied in order to balance the training data set and improve sensitivity • A filtered, rule-based classifier (OneR) was trained in order to find optimal cut-off values • Two thresholds (LT and HT) were found: • If pulse rate < LT then P • If pulse rate > HT than P • Else N

• Results: AUC=0.73; TPR=0.72; TNR=0.73, J=0.45

Davide Barbieri

IUAES 2016 Dubrovnik

13

Improved logistic • Dummy categorical variable: • =0 if LT