Understanding the patterns in Big Data ‘dark matter’ with GT data mining
By: Eng. Edith Ohri Codata Conference in Tel-Aviv 25-7 Feb 2013
Even when patterns are visible -Their causes, controls & alerts stay unknown.
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference - Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
2
And while data are in abundance They are too entangled to use. With all the tech advances we actually are today less in control …
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 3
In this presentation I’ll try to demystify the “problem” of big data analysis • In principle • And in practice, through 2 example provided by our colleagues – Prof. Yitshal Berner, and James McGee.
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 4
Problem #1 Data cleaning is no longer feasible - Also, cleaning changes the data and leads to wrong conclusions (in GT cleaning is forbidden).
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 5
Problem #2 Stat. sampling is no longer feasible - Statistics sampling demands a-priori information that is impossible to obtain, such as: variable interrelations, their distributions and prevalence… It simply cannot be done.
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 6
Problem #3 We need Searching not Testing - The point in GT is searching for hypotheses, not assuming & testing them. - The search has to be computer based; humans cannot grasp big-data’s complexities.
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 7
We need to clear the data ‘dark matter’ We need to clear up the ‘dark matter’ in data, to enable a better view of the remaining missing parts.
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 8
GT alternative solution GT (Group Technology) data mining is a search model that uncovers data’s hierarchical clusters and creates hypotheses. It is unique in the ability to discover hidden/rare/dynamic patterns.
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 9
How GT responds to the challenge?
• • • •
Huge size Complexity Unknown relations Checking the full Solution Space
25-27 Feb 2013
GT focus Hierarchical relations Identify multi-variable factors Root causes Hidden patterns Provision of new hypotheses + Validation Risk factors detection & control Prevention - Early indicators
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 10
2 examples 1) Defining patterns of behavior in data from a sensors-equipped house of an advanced age person. 2) Finding falls’ risk factors in data from a hospital.
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 11
1st example Find sensors’ patterns of behavior Objective: To characterize an old person’s behavior, in order to enable follow up on changes. Given: 2167 sensor events (only 1 week), with 14 variables. Medical information is not included here. Special problem: the person has visitors. GT solution: (a) Characterize the patterns of behavior and key factors. (b) Identify the patterns that belong to a specific person. 25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 12
1st example Find sensors’ patterns of behavior - Early Detection (the holy grail) 1. Identify early indicators in order to produce alerts. For doing that the solution must be able to 2.
Define patterns of behavior.
3.
Overcome data quality issues, e.g. outliers, mismatches etc.
4.
Integrate the data from various sensors/devices.
5. Combine medical information with sensor data stream (here the med info was unavailable, instead used “the level of activity” as a measure of wellbeing). 25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 13
1st example Find sensors’ patterns - with GT GT finds that the person’s most indicative pattern happens on Saturday, with 16+ min sensors activity , 7+ min time-between-sensors, and the Den as a favorable place. Correlations between 2 patterns
Various patterns found by GT 25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 14
1st example Find sensors’ patterns - GT accuracy GT results are 3 times more accurate than Statistics’.
GT searches for results that are both accurate & consistent. Note, that the search passes local optimums without stopping. 25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 15
2nd example - falls risk factors (Hospital DB) Objective: To find falls factors, based on data from a hospital’s hip-surgeries department. Given: 96 patient records (48 rec serve the Learning and 48 - the Testing). 88 variables. Falls history and follow up is not included. Special problems: More variables than records! Also, 1/8 of the variables are not numbers, and are too “thin” (many 0’s).
GT solution: (a) Clustering and stability validation. (b) Finding relations among group characteristics. 25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 16
2nd example - falls risk factors Can GT help prevention? Indicators are factors that appear earlier. GT detects them by their typical characteristics. Indicators are useful for creating alerts and initiating preventive treatment.
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 17
nd 2
example - falls risk factors GT clusters’ characteristics
High risk women; about 1/3 of the patients; have typically broken the hip once before; suffer from pains; prone to have Pneumonia and are typically treated with Benzod and Diuretic. Individuals with bad metabolism especially Diabetics; about 1/2 of the patients. Diabetics is a known factor of falls /hip injuries.
Relatively healthy women with no treatment against Osteoporosis. Additional typical causes in this group: Previous falls, Low Potassium and Parkinson. General related factors: Changes in Skeleton, Pains, Calcium blockers, and getting low score in the ‘drawing a clock’ test (DACL). 25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 18
GT use for remote health monitoring a.
Individual monitoring old people health.
b. Hospital research. c. Automated health care systems. d. “Crowd sourcing” patient feedback analytics.
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 19
Further vision: Social networks • • • •
Create new value for health networks Disease outbreak control Special communities, hidden needs. Continuous learning & education
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 20
Thanks • Acknowledgment to Prof. Yitshal Berner
Edith Ohri Dat@lert GT data Mining
[email protected]
25-27 Feb 2013
Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri
Slide 21