Data Fusion

2 downloads 0 Views 1MB Size Report
Data Fusion. A reliable solution for enriching existing data sources. Presented By: Rob Anderson. On behalf of: Softcopy ...
Data Fusion A reliable solution for enriching existing data sources.

Presented By: Rob Anderson On behalf of: Softcopy

The Problem Facing Marketers • Even though the amount of data available to marketers is growing at an exponential rate, it is becoming fragmented at an even faster rate.

• The result: Barriers to analysis, and missed business opportunities. • A marketers’ response: Try and get all or most of the information required from consumers from a single piece of research to avoid reconciling different subsets/samples. • The consequences: Lengthy surveys leading to compromises due to the extra load placed on respondents.  Lower response rates.  Less detail per response.  Increased cost.

The Future of AMPS in South Africa • From 2016, AMPS becomes smaller, more fragmented and less comprehensive. • Will take the form of an establishment survey that the other media research surveys will fuse with. • Likely to make analysis of AMPS more challenging due to the fragmentation of data sources.

Data Fusion as a Solution • Respondent level matching of data sets through the use of common characteristics to paint a broader picture of the consumer across surveys. • Predicts the most probable answers for survey questions, thereby creating and additional “virtual survey” with each respondent. • What does this mean?

The Benefits Fusion Offers • Cost effectiveness and convenience – no need for additional (expensive) primary research, and can be performed on demand. • Saves time on planning and undertaking research. • Provides a higher return on investments made in research activities by making survey data go further. • Surveys can be split up:  Reduces strain on respondents  Improves response rate  Improves depth and quality of responses.

Who is Using Data Fusion? • Who’s using it and where?  Syndicated fusion products are popular in the Americas.  Widely used as a media planning tool, particularly in Europe.  Market research agencies provide insight into advertising effectiveness by cross-tabbing product usage with media exposure.  South African JIC’s will implement fusion from 2016. • Exclusively in the realm of fusion specialists. • A black box process kept out of the domain of the end user.

One Concept, Different Approaches • A field pioneered in the 1970’s, yet there still isn’t one unified approach to Data Fusion. • The underlying concept: Find respondents in the “donor” set that most closely resemble those in the “receiver” set, then fuse them. • The    

different methodologies used by specialists: Genetic algorithms Predictive isotonic fusion Statistical matching using distance measures Other theoretical approaches…

iFusion as a Solution • 3 objectives of the iFusion approach in overcoming the barriers to implementation:

 Reveal the mysteries of the fusion “black box”.  A data fusion solution accessible to anyone, not just specialists.

 Make the process as close to instantaneous as possible.  Show the reliability of converting disparate data sets into a unified data source.

Fusion In a Nutshell

Approach to Data Fusion • Fusion methodology:  Numerical values are assigned to each linking variable.  Distances (differences) between data set respondents are calculated using Gower Distance to account for mixed variable types – interval, ordinal, binary, etc.  Create a distance index (scorecard) that ranks distances between respondents.  Match respondents with those that are the smallest distance from them –most alike- in the opposing data set.  Fuse respondents’ records to fill in “missing”/virtual responses.

Gower Distance Aggregate

Interval-scaled; Ordinal

Nominal; Binary

Data Fusion Considerations • Appropriateness of fusion:  Would it be quicker and more cost effective to just conduct another survey? • Knowledge of the data sets that are to be fused and their purpose:  Sample size, weights, nature of survey, media currencies…  Important consideration for constrained vs. unconstrained matching. • Try to ensure that the data sets are approximately represent the same “universe”. • Selection of critical variables such as gender.  To avoid the situation of males having been “pregnant last year”.

Constrained • What it does:  Preserves Media Currencies, proportions and target group indices/ordering of the original databases being fused. • How it does it:  Calculates adjustments in respondent numbers in the donor data set needed to ensure a 1:1 respondent match and account for differences in weights. • Real-world application:  Fusion of AMPS and RAMS to match radio audience data with media & product usage data. • Additional usefulness:  Community radio – small sample sizes are accurately padded with virtual respondents allowing for more robust statistical testing.

Approach to Data Fusion • Constrained Matching

Select target and/or critical variables .

Calculate distances/costs between respondents based on target variables

Create an index based on calculated distances.

Create virtual respondents in donor set based on donor weights – correspond with receiver weights

Fuse Donor and Receiver records 1:1

In Fused Set: N = receiver population size Original Proportions = intact

Unconstrained • What it does:  Fuses data when preserving exact proportions and media currencies isn’t important or necessary. • How it does it:  Finds best respondent matches in the donor set and reuses them multiple times in the receiver set.  Each time incurs a selection penalty to avoid donor respondent overuse. • Benefits:  Fuses data sets quicker, as calculations are less computationally intensive.

Approach to Data Fusion • Unconstrained Matching

Select target and/or critical variables .

Calculate distances/costs between respondents based on target variables

Create an index based on calculated distances.

Fuse best donor record to receiver multiple times if best match.

Add cost to donor respondent score each time to limit re-usage.

In Fused Set: N = # receiver records Original proportions = disregarded

Approach to Data Fusion • Unconstrained Matching

Fusion on-demand • Realistically, target/linking variables will change from one problem to the next.

• A pre-fused data set may not be a one-size-fits-all. • Waiting for a fusion specialist to perform additional fusions takes time. • To make fusion more accessible and viable, marketers need to be able to create their own custom-fused data sets on-demand. • Custom fused sets should be able to be tailored and modified at will to address specific research problems.

Reliability of the Process • Testing the reliability of data fusion:  Split sample validation

• Split data set into two parts: Part A & Part B, each with its own linking and target variables. • Fuse Part B to Part A and check how closely the new, fused dataset matches the original Part A respondent records in terms of products, media, etc. • Statistic to determine accuracy: Bias = The % difference between original and fused data. • 0% bias = 100% accuracy

Testing & Results • Split Sample/foldover validation:

Original Dataset

Receiver Split Dataset

Donor

Enriched Dataset Fuse Datasets

Testing & Results • iFusion results:

237, 552 Receivers

Linking-variables based on CHAID analysis: Community, Gender, Life Stage, Age Bias = E(Audience) – θ = |237,552 - 240,860| = 3,308 %Bias = 3,308/37,664,537 * 100 = 0.0088% %Accuracy = 100% - 0.0088% = 99.9912%

240, 860 Donors

In Conclusion • iFusion approach shows that Data Fusion…:  Is indeed reliable.  Can be performed by any marketer or other end user.  Can be performed very quickly on basic hardware, saving significant costs in time and money. • The ultimate solution for unifying increasingly fragmented data and helping research go further. • It’s never been a better time for marketers to start adopting Data Fusion to exponentially enhance consumer research.