with GT data mining

5 downloads 463 Views 956KB Size Report
Principles and examples of GT Data Mining for the Codata conference - Remote ... I'll try to demystify the “problem” of big data analysis ... GT alternative solution.
Understanding the patterns in Big Data ‘dark matter’ with GT data mining

By: Eng. Edith Ohri Codata Conference in Tel-Aviv 25-7 Feb 2013

Even when patterns are visible -Their causes, controls & alerts stay unknown.

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference - Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

2

And while data are in abundance They are too entangled to use. With all the tech advances we actually are today less in control …

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 3

In this presentation I’ll try to demystify the “problem” of big data analysis • In principle • And in practice, through 2 example provided by our colleagues – Prof. Yitshal Berner, and James McGee.

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 4

Problem #1 Data cleaning is no longer feasible - Also, cleaning changes the data and leads to wrong conclusions (in GT cleaning is forbidden).

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 5

Problem #2 Stat. sampling is no longer feasible - Statistics sampling demands a-priori information that is impossible to obtain, such as: variable interrelations, their distributions and prevalence… It simply cannot be done.

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 6

Problem #3 We need Searching not Testing - The point in GT is searching for hypotheses, not assuming & testing them. - The search has to be computer based; humans cannot grasp big-data’s complexities.

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 7

We need to clear the data ‘dark matter’ We need to clear up the ‘dark matter’ in data, to enable a better view of the remaining missing parts.

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 8

GT alternative solution GT (Group Technology) data mining is a search model that uncovers data’s hierarchical clusters and creates hypotheses. It is unique in the ability to discover hidden/rare/dynamic patterns.

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 9

How GT responds to the challenge?

• • • •

Huge size Complexity Unknown relations Checking the full Solution Space

25-27 Feb 2013

GT focus Hierarchical relations Identify multi-variable factors Root causes Hidden patterns Provision of new hypotheses + Validation Risk factors detection & control Prevention - Early indicators

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 10

2 examples 1) Defining patterns of behavior in data from a sensors-equipped house of an advanced age person. 2) Finding falls’ risk factors in data from a hospital.

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 11

1st example Find sensors’ patterns of behavior Objective: To characterize an old person’s behavior, in order to enable follow up on changes. Given: 2167 sensor events (only 1 week), with 14 variables. Medical information is not included here. Special problem: the person has visitors. GT solution: (a) Characterize the patterns of behavior and key factors. (b) Identify the patterns that belong to a specific person. 25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 12

1st example Find sensors’ patterns of behavior - Early Detection (the holy grail) 1. Identify early indicators in order to produce alerts. For doing that the solution must be able to 2.

Define patterns of behavior.

3.

Overcome data quality issues, e.g. outliers, mismatches etc.

4.

Integrate the data from various sensors/devices.

5. Combine medical information with sensor data stream (here the med info was unavailable, instead used “the level of activity” as a measure of wellbeing). 25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 13

1st example Find sensors’ patterns - with GT GT finds that the person’s most indicative pattern happens on Saturday, with 16+ min sensors activity , 7+ min time-between-sensors, and the Den as a favorable place. Correlations between 2 patterns

Various patterns found by GT 25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 14

1st example Find sensors’ patterns - GT accuracy GT results are 3 times more accurate than Statistics’.

GT searches for results that are both accurate & consistent. Note, that the search passes local optimums without stopping. 25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 15

2nd example - falls risk factors (Hospital DB) Objective: To find falls factors, based on data from a hospital’s hip-surgeries department. Given: 96 patient records (48 rec serve the Learning and 48 - the Testing). 88 variables. Falls history and follow up is not included. Special problems: More variables than records! Also, 1/8 of the variables are not numbers, and are too “thin” (many 0’s).

GT solution: (a) Clustering and stability validation. (b) Finding relations among group characteristics. 25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 16

2nd example - falls risk factors Can GT help prevention? Indicators are factors that appear earlier. GT detects them by their typical characteristics. Indicators are useful for creating alerts and initiating preventive treatment.

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 17

nd 2

example - falls risk factors GT clusters’ characteristics

High risk women; about 1/3 of the patients; have typically broken the hip once before; suffer from pains; prone to have Pneumonia and are typically treated with Benzod and Diuretic. Individuals with bad metabolism especially Diabetics; about 1/2 of the patients. Diabetics is a known factor of falls /hip injuries.

Relatively healthy women with no treatment against Osteoporosis. Additional typical causes in this group: Previous falls, Low Potassium and Parkinson. General related factors: Changes in Skeleton, Pains, Calcium blockers, and getting low score in the ‘drawing a clock’ test (DACL). 25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 18

GT use for remote health monitoring a.

Individual monitoring old people health.

b. Hospital research. c. Automated health care systems. d. “Crowd sourcing” patient feedback analytics.

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 19

Further vision: Social networks • • • •

Create new value for health networks Disease outbreak control Special communities, hidden needs. Continuous learning & education

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 20

Thanks • Acknowledgment to Prof. Yitshal Berner

Edith Ohri Dat@lert GT data Mining [email protected]

25-27 Feb 2013

Principles and examples of GT Data Mining for the Codata conference Remote Health Monitoring For The Elderly © All Rights Reserved to Edith Ohri

Slide 21