Three Data Partitioning Strategies for Building Local ... - Google Sites

Recommend Documents

Improved Data Partitioning For Building Large ROLAP Data Cubes in

We have implemented our new parallel shared-nothing data cube ... For a given raw data set, R, with N records and d attributes (dimensions), a view is .... relation (method MPI_ALL_TO_ALL_v in MPI). ... is created in five main steps: Computing Root .

Axes-Oblique Partitioning Strategies for Local Model ...

Local learning in local model networks. In R. Murray-Smith and T.A. Johansen (Eds.), editors,. Multiple Model Approaches to Modelling and Control, chapter 7, ...

partitioning strategies for distributed association

(1997) use a tree data structure (a trie) similar to the hash tree used by Agrawal and Srikant .... (Note that all communication is asynchronous via the JavaSpace). 3. PREVIOUS ...... Large Scale Parallel Data Mining, Lecture Notes in. Artificial ...

Towards Data Partitioning for Parallel Computing on Three ...

[8] The Jabberwocky Project, http://jabberwocky.anu.edu.au/. [9] Alexey Kalinov and Alexey Lastovetsky, "Heterogeneous. Distribution of Computations While ...

Improved Data Partitioning For Building Large ... - Semantic Scholar

ACD j and AD j. , are not a prefix of ABCD and are therefore in a sort order that is different from the global sort .... We call this process âpivotingâ and refer to a4 as the pivot. If we choose ..... Almaden Research Center, San Jose, CA. Shukl

Towards Data Partitioning for Parallel Computing on Three ...

network topology is a linear array this partitioning always results in a lower total volume of communication compared to existing partitionings, provided the most ...

data partitioning for ensemble model building - AIRCC Online

Start a Pass operation to build an ensemble model on the new data, and then. 2. Merge the newly created .... IBM Corporation, Armonk, New York, 2013.

Active Data Partitioning for Building Mixture Models - Semantic Scholar

We also sug- gest a promising way to use the data subsets partitioned by. RDS. KEYWORDS: data partitioning, active learning, mixture of experts. I. Introduction.

Building Bridges to Other Departments: Three Strategies

Mar 13, 2010 - These efforts may result in more CS ma- jors, increased enrollments in certain CS courses, and a stu- dent population that is better equipped to ...

Building Resilient Students: Three Key Strategies

Teachers can help their students learn how to deal with problems they ... Know Why the Caged Bird Sings on her own while the class read it and keep a journal.

High-Level Data Partitioning for Parallel Computing on ... - Google Sites

Nov 23, 2010 - Comparison of Communications on a Star Topology . ...... (2001b), the future of computing platforms is be

Autonomic Proactive Runtime Partitioning Strategies for ... - CiteSeerX

Autonomic Proactive Runtime Partitioning Strategies for SAMR Applications .... responsible for hierarchically partitioning, scheduling, and mapping application ...

Efficient Strategies for Improving Partitioning-Based Author

gather additional nodes, without significant loss of performance gain. ... Given a set of research paper citations, referring to authors with the same last name ...

Partitioning Strategies for Spatio-Textual Similarity Join

to accumulate similarity scores between objects (Line 25-. 30). The object .... points (POIs) to regions (ROIs). The authors ..... ularity at which to apply the filtering: it is very straight- ... cities, crawled from Flickr using geographic queries

Multi-view clustering via spectral partitioning and local ... - Google Sites

(2004), for example, show that exploiting both the textual content of web pages and the anchor text of ..... 1http://www

Books Download Strategies for Building a Web 2.0 ... - Google Sites

Strategies for Building a Web 2.0 Learning Environment Free Read Online, Strategies for Building a Web 2.0 Learning Envi

Download Scaling Teams: Strategies for Building ... - Google Sites

... training consulting and methodology Download and view free webinars now ... may seem like it was built for the softw

Read PDF Strategies for Teaching Strings: Building a ... - Google Sites

... Read Online Strategies for Teaching Strings: Building a Successful String and .... instruments in schools and to dev

building open data infrastructure and strategies for effective citizen

May 27, 2015 - Capacity Development is key to Growth of Open Data Initiatives. ..... web applications like Eduweb8 and F

Best PDF Scaling Teams: Strategies for Building ... - Google Sites

Best PDF Scaling Teams: Strategies for Building Successful Teams ... The Startup Way: How Entrepreneurial Management Tra

PdF Review Strategies for Teaching Strings: Building a ... - Google Sites

... out our InformationWeek com News analysis and research for business technology professionals plus peer ... Publisher

PDF Download Scaling Teams: Strategies for Building ... - Google Sites

recipe for chaos if company leaders arenÃ¢â¬â¢t prepared for the pitfalls of hyper-growth. ... essential strategies an

ePub Connecting with Students: Strategies for Building ... - Google Sites

7 days ago - Building Rapport with Urban Learners Read Full ... children as the easiest to recognize among our special p

[PDF BOOK] Reputation Rules: Strategies for Building ... - Google Sites

Company's Most Valuable Asset READ ONLINE By Daniel. Diermeier. Online PDF Reputation Rules: Strategies for Building You

Three Data Partitioning Strategies for Building Local ... - Google Sites

Download PDF

0 downloads 102 Views 2MB Size Report

Comment

Experiments. â CLU, CLU2, FEA and meta ensemble (MMM). â Baselines: naive (NAI), random partitoning. (RAN) and no pa

Three Data Partitioning Strategies for Building Local Classiers Indrė Žliobaitė TU Eindhoven 2010, September 20

Set up

Ensembles Training set for each member Randomized procedure

Evaluation Competence of each member Assigned region of competence

Deterministic procedure

Ensembles Training set for each member Randomized procedure

Evaluation Competence of each member Assigned region of competence

Deterministic procedure

Set up

●

Specific types of ensembles, which ●

Partition the data into non intersecting regions

●

Train one classifier per partition

●

Use classifier assignment for the final decision

Classifier 4 Classifier 1

Classifier 5 Classifier 2

Classifier 3

Classifier 4 Classifier 1

Classifier 5 Classifier 2

Classifier 3

Set up ● ●

We will explore three data partitioning strategies We will build a meta ensemble consisting of local experts

Set up ● ●

●

We will explore three data partitioning strategies We will build a meta ensemble consisting of local experts Motivation ●

divide and conquer

●

use different views to the same learning problem

●

assess the impact of class labels to partitions

●

building blocks for handling contexts / concept drift

Partitioning

Three partitioning techniques ●

Cluster the input data

●

Cluster each class separately

●

Partition based on a selected feature

Toy data

Clustering all (CLU) Cluster the input data

Clustering all (CLU) Cluster the input data

Build classifiers

Clustering all (CLU) Cluster the input data

Build classifiers

Select the relevant classifier

Clustering within classes Cluster the first class

A B

Clustering within classes Cluster the first class

A B

Cluster the second class

D C

Clustering within classes Cluster the first class

A B

Build the classifiers (pairwise)

A D

B A

Cluster the second class

C

D C

D

B

C

Clustering within classes Build the classifiers (pairwise)

D

A D

Select two closest clusters = the relevant classifier

B A C

B

C

Partitioning based on a feature Slice the data and build classifiers

Partitioning based on a feature Slice the data and build classifiers

Select the relevant classifier

Experiments

Experiments ● ●

●

CLU, CLU2, FEA and meta ensemble (MMM) Baselines: naive (NAI), random partitoning (RAN) and no partitioning (ALL) Classification datasets from various domains ●

dimensionalities 7-58

●

sizes 500- 44000

●

two classes

Intuition ●

Partition makes sense if CLU, CL2, FEA < ALL

●

Small sample size problem if ALLNAI>ALL>CLU>RAN>CL2 Shut: FEA>EEE>CL2>CLU>RAN>ALL>NAI Marc: EEE>FEA>CLU>CL2>ALL>RAN>NAI Spam: EEE>CLU>FEA>RAN>CL2>ALL>NAI Elec:

EEE>CLU>RAN>FEA>CL2>ALL>NAI

Chess: EEE>CLU>ALL>CL2>RAN>FEA>NAI

0 .3 6

0 .0 6

ALL 0 .3 4

ALL

0 .3 2

C L2

FE A

0 .3

C LU

0 .2 8 0 .2 6

R A N

0 .0 4

's h u t ' d a t a

C LU 0 .0 3

C L2 0 .0 2

M M M 0 .0 1

M M M

0 .2 4 0 .2 2

0 .0 5

R A N te s tin g e r r o r

te s t in g e r r o r

How many partitions?

FEA

'e le c ' d a t a 2

4

6

8

n u m b e r o f p a r t it io n s ( k )

10

0

2

4

6

8

n u m b e r o f p a r titio n s ( k )

10

Summary ●

Better with more partitions, but there is a risk of small taining sample

●

Mediciore performance individually

●

Meta ensemble performs well

Outlook ●

Partitioning with constraints

(e.g. Ignoring label informaition) ●

Data partitioning

for contextual and adaptive learning

Thank you

Transition images from www.images.com