State's Institute for the Study of Business Markets and ISBM's Executive Director .... Every manager wants to know the best way to segment the market.
MARKETING ENGINEERING: THE MODULE SERIES
Direct Market Segmentation Using Customer Needs Software: Cluster Analysis
by Gary L. Lilien Arvind Rangaswamy Copyright © 2004 by DecisionPro, Inc. To order copies or request permission to reproduce materials, go to www.decisionpro.biz. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means – electronic, mechanical, photocopying, recording or otherwise – without the permission of DecisionPro, Inc. T02 Revised 19 July 2004
PREFACE Marketing Engineering: The Module Series Since we launched the Marketing Engineering platform in 1997, we have increasingly been asked for individual modules: Text+Case+Software on a single topic. To meet that need, we are pleased to announce the launch of the Marketing Engineering The Module Series. Users can now order individual modules for their own (or class) use, and modules can now be combined into “custom suites” for mass use, including integrated versions of the selected software that can be run on a university’s network. We will continue to support our flagship product, Marketing Engineering, but will be using the Module Series to introduce new cases and new software on an ongoing basis. Our current set of Module offerings is listed on the following page. We appreciate your interest in Marketing Engineering: The Module Series 2004. For more information, terms and the most recent set of offerings see http://www.mktgeng.com/products/index.html. The development of this Marketing Engineering module and the entire module series has been made possible by the support of the companies that sponsor Penn State’s Institute for the Study of Business Markets and ISBM’s Executive Director and Marketing Engineering’s chief cheerleader, Ralph Oliva. Much of the work on this project took place at the Australian Graduate School of Management, where Lilien held the Freehill’s Visiting Professorship, and that support is gratefully acknowledged. The design, development, production and editing of this material required the cheerful and capable work of Stephen Carpenter, B.J. Clitherow, Andrea K. Minnie and Mary Wyckoff. The accompanying software has involved the work of many people over the years, but is chiefly the work of Andrew “Nuke” Stollak and Daniel Soto Zeevaert. Space prohibits us from individually noting the contributions of the many others who have been part of the Marketing Engineering team and to whom we are so deeply indebted. Gary L. Lilien and Arvind Rangaswamy State College, PA May, 2004
T02
Direct Market Segmentation Using Customer Needs
2
DecisionPro offers the following software modules which illustrate various Marketing Engineering concepts. Business cases are also available to demonstrate the concepts with a real world scenario. Concept
Software Title
Software #
Marketing Portfolio Analysis and Prioritization with the GE/McKinsey Approach
GE Portfolio Planning (ge.xls)
S01
Direct Market Segmentation Using Customer Needs
Cluster Analysis
S02
Choice-Based Segmentation
Choice-Based Segmentation (abb.xls)
S03
Positioning and Perceptual Mapping
Positioning Analysis
S04
Customer Targeting with Choice Models
Multinomial Logit Analysis
S05
Product Design with Conjoint Analysis
Conjoint Analysis
S06
Forecasting with the Bass Diffusion Model
Bass Model (gbass.xls)
S07
Pretest Market Forecasting with the ASSESSOR Model
Assessor Pretest Market Model
S08
Advertising Budget Decisions and ADBUDG
ADBUDG: Advertising Budgeting
S09
Advertising Copy Development and ADCAD
ADCAD: Ad Copy Design
S10
SalesForce Call Planning: The CALLPLAN Model
Generalized Callplan Model
S11
Sales and Marketing Spending and Allocation
ReAllocator (ReAllocator.xls)
S12
Value-in-Use and Value-based Pricing
Value-in-Use Pricing (value.xls)
S13
Competitive Bidding
Competitive Bidding
S14
Revenue Management
Revenue Management for Hotels (revenue.xls)
S15
Promotional Response Models
Promotional Spending Analysis
S16
3
TABLE OF CONTENTS Introduction: The Segmentation Process Segmentation Practice Meets Segmentation Theory Segmenting Markets: A Five-Step Approach Segmentation Research Design and Data Collection Developing the Measurement Instrument Selecting the Sample The Unit of Analysis: Selecting and Aggregating Respondents Constructing a Database Updating and Refreshing the Database Forming and Profiling Segments Direct Segmentation Methods Reducing the Data with Factor Analysis Developing Measures of Association Identifying and Removing Outliers Forming Segments Profiling Segments and Interpreting Results References Appendix 1: Factor Analysis for Preprocessing Segmentation Data Appendix 2: Similarity Measure Issues and Examples Appendix 3: An Illustration of Wards Method for Clustering
`
5 6 8 12 13 14 15 16 17 17 18 19 20 20 20 24 28 29 31 35
EXHIBITS Exhibit 1: Appropriate Segmentation Bases Depend on Use Exhibit 2: A Five-Step Approach to Segmenting Markets Exhibit 3: Common Bases and Descriptors to Segment and Describe Markets Exhibit 4: Choice-based Segmentation Example for Database Marketing Exhibit 5: Similarity Data for “Essential Features” Data
7 9 10 16 19
Exhibit 6: Dendograms for Single Linkage and Complete Linkage Clustering
22 26 27 31 36 36
Exhibit 7: Segment Profiles (Snake Chart) for Two Segments Exhibit 8: Typical Process of Direct Segment Formation Exhibit A2.1: Similarity Data for “Essential Features” Data Exhibit A3.1: Summary Calculations for Ward’s ESS Method Exhibit A3.2: Dendogram for Ward’s ESS Method
T02
Direct Market Segmentation Using Customer Needs
4
Introduction: The Segmentation Process Markets are heterogeneous. Customers differ in their values, needs, wants, constraints, beliefs and the incentives that will motivate them to act in a particular way. Products compete with one another in attempting to satisfy the needs and wants of these customers. By segmenting the market, firms can better understand their customers and target their marketing efforts efficiently and effectively. Through segmentation, an organization seeks a middle ground where it does not rely on a common marketing program for all customers, nor does it incur the high costs of developing a unique program for each customer. Two definitions are critical to the concept of segmentation: •
Market segmentation is the process of dividing customers with significantly different valuations of a product or service into groups or segments containing customers whose valuations vary little within the group but vary significantly between groups.
•
A market segment is a group of actual or potential customers who can be expected to respond in a similar way to a product or service offer. That is, they want the same types of benefits or solutions to problems from the product or service, or they respond in a similar way to a company’s marketing program.
Three factors provide an opportunity for a firm to successfully segment a market. •
Heterogeneity: First, when customers needs differ, some customers will be willing to pay a premium for those products and services that better meet their needs and wants than do standard products.
•
Accessibility: Second, although customers may be heterogeneous, a firm should be able to access those customers whose needs are different from those in other groups but similar to those of some other customers (in the same group).
T02
Direct Market Segmentation Using Customer Needs
5
•
Cost/Effectiveness: Third, serving customers in a segment differently from those in other segments must be cost effective – a firm should be better off financially serving segments differently than it would be serving them in a similar manner. (That analysis must include an assessment of what would happen without segmentation—what are the chances a competitor would segment and serve the market and leave the firm worse off?)
Segmentation Practice Meets Segmentation Theory Some of the more common characteristics that firms use to classify and target customers are: •
heavy versus light versus non-users of a product category;
•
single-brand versus multi-brand users in a product category;
•
price sensitive versus non-price sensitive buyers;
•
early versus later versus non-adopters of a new product or service;
•
internet shoppers versus non-internet shoppers;
•
experienced versus novice buyers.
Every manager wants to know the best way to segment the market. Some argue that there is little theory to guide this decision. By nature, segmentation is a family of methods and the role of the manager is to pick one, or a combination, that will work best for the problem at hand. But, given that problem, what should one do? To approach this problem, we need a segmentation model involving two sets of variables: bases and descriptors. The segmentation basis should describe why customers want a particular (type of) product or service. More often than not, these reasons relate to the benefits offered and/or the solutions to problems resolved by the product/service. These motivations are at the heart of why people
T02
Direct Market Segmentation Using Customer Needs
6
will respond differently to the marketing strategies of different companies (that is, why they value offerings differently). Exhibit 1 lists some segmentation bases and how they link to the reason for the segmentation study.
Exhibit 1: The most appropriate segmentation bases depend on the managerial use of the segmentation. There is no single, best segmentation. Source : Wind 1978, p. 320.
Segment descriptors (like age, income, use of the internet and media preferences) are not designed to be grouping variables; rather, they should help marketers deliver different product or service offerings to the customer segments after those segments have been identified. However, in practice, firms often use such descriptive information – demographic (such as age, gender, type of household, etc.) and firmographic (such as size of firm and industry) variables – as part of the segmentation basis because this information is what they have or is easy to collect. In order to use need or demand basis variables, we must collect information about customers’ needs and the situations in which they use the product. Thus we can
T02
Direct Market Segmentation Using Customer Needs
7
segment markets in one of two ways: we can start with characteristics of customers that are easy to identify and see if the resulting customer groups have different needs. Alternatively, we can group customers based on their needs and then search for discriminating characteristics that enable us to identify groups that differ in their needs. The first approach is called convenience-group or backward segmentation, where we form market segments based on how convenient customers are to find. Companies that are heavy users of advertising often use this approach because the magazines, newspapers and television stations profile their audiences using such demographic and psychographic data. Many industrial marketing companies also use convenience-group segmentation – relying on easy-tomeasure variables such as firm size, industry and geographic location. The trouble with convenience-group segmentation is that demographic characteristics are often very poor predictors of benefit segmentation membership. All segmentations are tradeoffs – no overall best segmentation basis exists; the marketing problem, the time available to conduct a segmentation study, the availability of relevant data and other considerations will dictate the appropriate approach. Whatever the approach, you should try to use values, need and demand-related variables for the segmentation basis and other variables as descriptors. Do not discard customer information; however, keeping non-needs based data separate will enhance the quality and interpretability of the ultimate segmentation. Following this philosophy, our segmentation software keeps the segmentation and descriptor variables in two separate data files.
Segmenting Markets: A Five-Step Approach Whenever possible, firms should be proactive in segmenting the market. They should identify differences in customers’ needs, wants and preferences, and then
T02
Direct Market Segmentation Using Customer Needs
8
see if they can design products and strategies to profitably serve these different needs. We suggest a five-step approach (Exhibit 2).
Exhibit 2: A Five Step Approach to Segmenting Markets •
Step 1: Specify the Reason(s) for the Segmentation: Why are we segmenting?
•
Step 2: Select Segmentation Variables
•
Step 3: Choose a Procedure to Form Segments
•
Step 4: Determine the Appropriate Number of Segments
•
Step 5: Select Segment(s) to Target
Step 1 is to explicitly outline the role of market segmentation in the company’s strategy: Why are we segmenting? How will it help the firm to establish a competitive advantage, and what other actions might the firm take to achieve its objectives. A firm should not segment the market without first considering its overall strategic intent and its core competencies. Step 2 is to select a set of segmentation variables. These variables should be based on some aspect of potential customers’ needs or wants and should reflect differences between customers. To select an appropriate set of segmentation variables, the firm needs intimate knowledge of the factors that drive demand for its products and services. In many consumer markets, the segmentation variables reflect customer differences of the perception of product attributes (functional value), socioeconomic (economic value) and lifestyle characteristics (psychological value) of customers. They may also be based more directly on the function of the product (the job it does), its economic value (a great buy), and/or psychological factors (it makes the customer feel good). In industrial markets, the benefits customers seek depend less on the psychological and socioeconomic
T02
Direct Market Segmentation Using Customer Needs
9
characteristics of the individual making the purchase decision and more on the end use of the product and the profitability it generates. Consumer and industrial marketers use similar categories of segmentation variables (Exhibit 3). These variables are used to profile the segments in the market to find actual and potential customers, to understand their purchase motivations, and to understand how best to communicate with them.
Exhibit 3: A list of common bases and descriptors that can be used to segment and describe markets, noting the differences between consumer and industrial variables.
Some of the variables in Exhibit 3 can serve both as bases and descriptors. In some consumer markets, the usage situation will affect the customer benefits, and thus the product features preferred in a product.
T02
Direct Market Segmentation Using Customer Needs
10
A sound data collection strategy, includes variables that •
measure the size, purchasing power and profitability of the segments;
•
determine the degree to which the organization can effectively reach and serve the segments; and
•
develop effective programs to attract and retain customers.
Step 3 is to choose a procedure to group aggregate individual customers into homogeneous groups or segments. Statistics alone will not answer the question sufficiently; managerial issues will be involved. How should we define closeness between pairs of customers? Are customer segments to be discrete (each customer in only one segment), overlapping (a customer can be in two or more groups) or fuzzy (each customer is assigned a degree of membership in all segments). Assigning each customer to a single segment is easier to understand and to apply, but we may be sacrificing information. Overlapping or fuzzy segments are intuitively more realistic but it is harder to develop a segmentation strategy in such circumstances: firms position their products broadly to appeal to the different overlapping segments. Such broad positioning often lacks focus and is really no positioning at all. Step 4 is to develop the right number of segments. Since the decision on the right number of segments blends managerial with statistical issue, there is no objective best solution; the challenge is to choose a decision rule that trades off precision (the more segments the better) against parsimony (the fewer segments the better). Thus, even with the help of statistical analysis, the selection of the right number of segments will involve both art and science. Step 5 involves a search across those segments to determine which one(s) to target. If the firms chooses too many segments, it may create too many products and services to serve these segments. Costs rise with the number of products that are designed, sold and carried in inventory.
T02
Direct Market Segmentation Using Customer Needs
11
One must be cautious here also, as too few segments open up opportunities for niche marketers to design products, which better fit the needs of small but profitable segments. Firms should carry out Steps 4 and 5 iteratively. They should split the (potential) market into two groups, then three groups, then four groups, and so on. Then they should examine each of these segment structures using both managerial and statistical criteria to eliminate any that are statistically or managerially unsuitable.
Segmentation Research Design and Data Collection While there are many ways to segment markets and many data sources, both internal and external to the firm, we will focus here on a typical formal segmentation research study, based on the collection of primary source data. Such a study consists of five key activities: 1. Developing the measurement instrument (often a customer survey form): What information do we want to collect, and how should we collect it? 2. Selecting a sample: Who are we studying? (Which respondents? Where? When? In what households or organizations?) 3. Specifying the unit of analysis: Who is the focus of the study – an individual, a household, an organization? How can we take different responses from several individuals in a household or an organization and use them to predict how the household or organization will behave? 4. Analyzing the data and segmenting the market: What statistical procedures can we use to segment (potential) customers and to describe aspects of their behavior that are crucial to serving their needs? 5. Developing a procedure to update the process and refresh the data (repeat steps 1-4) regularly.
T02
Direct Market Segmentation Using Customer Needs
12
These topics are covered in much more detail in market research texts; we outline the key issues and illustrate the important points. (Note that even when you are dealing with data from a secondary source – data already collected – steps 1 and 2 may have passed, but you still must follow steps 3 and 4.)
1. Developing the Measurement Instrument Measurement instruments for segmentation studies are usually designed to collect several types of data (Exhibits 1, 3): •
Demographic descriptors, such as age, income, marital status and education, for individual consumers and industry classification, size (number of employees or sales) and job responsibilities for organizations;
•
Psychographic descriptors, such as activities, interests, opinions and lifestyle, for consumer and service markets;
•
Decision making descriptors, such as is it a low or high involvement decision, and for industrial markets: who are the members of the buying group, how do they search for information and how does the group reach a decision;
•
Demand, including historical purchases or consumption and anticipated future purchases;
•
Purchase motivators, such as problem removal, problem avoidance, normal depletion, incomplete satisfaction, intellectual stimulation, social approval and sensory gratification;
•
Needs, which could be stated needs or needs inferred through such methods as conjoint analysis or value in use analysis;
T02
Direct Market Segmentation Using Customer Needs
13
•
Attitudes, which could be about products, suppliers, risk of purchase or the adoption process in general;
•
Media and distribution channel use, such as the types and amount of media used and where products and services are typically bought.
2. Selecting the Sample For any market research study, the analyst must define the population to be studied (the universe) and the means for gaining access to a representative sample of that universe (the sampling frame). The sample universe might be all U.S. firms; the sample frame might be the list of those firms that are furnished by Dun and Bradstreet, and the sample might be a random selection of firms from the Dun and Bradstreet list. For exploratory research or for small-sample studies, using a convenient sample or judgmental approach to select respondents is often appropriate. For research designed to project to the full market, analysts usually use some form of probability sample, such as •
Simple, random sampling, where every member of the sample frame has an equal chance of being chosen to be a member of the sample;
•
Cluster sampling, where the unit of selection is a group (say, all households on a street) and each group has an equal chance of being selected as a member of the sample; and
•
Stratified sampling, where the sampling frame is broken into strata, which the user believes are different from one another but whose members are relatively homogeneous, and where simple random sampling within each strata is used to generate the sample. (In our terminology, these strata are segments.)
T02
Direct Market Segmentation Using Customer Needs
14
Where possible, we recommend some form of stratified sampling, with larger samples taken from more "important" strata (e.g., heavy users, likely brand switchers, larger organizations or highly profitable customers).
3. The Unit of Analysis: Selecting and Aggregating Respondents In households and social groups, people make many purchases for the group as a whole (vacation spot, entertainment event, etc.), based on the preferences of important members. In organizations, a number of individuals representing different points of view are often involved in purchase decisions. These buying groups often include a purchasing agent (frequently most interested in price, service and on-time delivery), a user (interested in certain specific features), an account manager (involved heavily in managing and maintaining supplier relationships) and the accountant (interested in the impact on the budget perhaps willing to trade off higher initial costs for savings elsewhere). In choosing a sample, you must consider two key issues: •
How many respondents per unit should you survey?
•
If there is more than one respondent per unit, how can you cross-check their responses, and how should you aggregate their responses?
Common sense tells us that if everyone in a household or an organization agrees about their needs, then we need only a single respondent. However, when a firm has little prior experience with the purchase and when the purchase is critically important to the firm, the responses of a single respondent could be misleading. When the needs of those within the group differ, it is important to study the two or three people who have the most influence in the decision and to combine their responses in such a way that the aggregated preference scores are higher for those alternatives that require the least compromise for the individuals involved in the decision process. T02
Direct Market Segmentation Using Customer Needs
15
4. Constructing a Database Usually, the data collected in a segmentation study are structured into a data matrix (Exhibit 4); the columns in the matrix correspond to the variables measured, and each row contains the responses of one respondent. Even when a particular study does not organize data in this way, it is a useful way to think about segmentation data. Exhibit 4: Choice-based segmentation example for database marketing: target those customers whose (expected) profitability exceeds the cost of reaching them by comparing column D with the cost to reach that customer.
Exhibit 4 shows part of a data matrix from a study of needs for organizational use of PCs. In collecting such data and constructing a data matrix, you should address a number of issues: Q1:
Who is the respondent, and how will they be identified in the data
matrix (columns 1 and 2 in Exhibit 4)? As just discussed, different respondents in the same household or, more critically, in the same organization may give quite different responses. Q2:
What kind of data are you gathering? Nominal data, such as yes-no,
or industry classification data, are not easy to compare to data obtained from rating scales. Q3:
Are the measurement scales the same? If the scales are different
(agree-disagree on a 1 to 7 scale vs. estimated demand on a 1 to 10,000 unit scale), you need to employ some form of data standardization.
T02
Direct Market Segmentation Using Customer Needs
16
Q4:
Are the variables correlated? Often several variables measure
different aspects of the same thing. For example, if "quality of service" and "on-time arrival" mean the same thing to airline customers, perceptions and importance ratings for these items should be combined in some way to avoid double counting. Q5:
How should you handle outliers, that is, unusual respondents?
Some outliers represent incorrect data while others may represent unique situations that are better discarded. But some outliers may represent new, emerging segments!
5. Updating and Refreshing the Database Segmentation data age quickly, and a segmentation research program is not complete without a plan for monitoring and refreshing the database. Segmentation is a process, not an individual study, and regular market monitoring and updating are essential for the process to be successful.
Forming and Profiling Segments Thus far, we have argued that for a market to be segmentable; it should satisfy three managerial criteria: it should be heterogeneous, segments should be accessible, and the process should be cost-effective. Our segmentation methods should help our segmentation satisfy these three criteria through three related technical criteria: homogeneity, identifiability and parsimony.
Homogeneity measures the degree to which the potential customers in a segment have similar needs and values while heterogeneity measures the degree to which groups of customers differ from each other. Our segmentation methods seek homogeneity within segments and heterogeneity between segments.
Identifiability is the degree to which marketers using observable characteristics (descriptors) of the segments can identify (and reach) segment members—it is a measure of accessibility. T02
Direct Market Segmentation Using Customer Needs
17
Parsimony is a measure of how well the segmentation scheme derived from the data about (potential) customers trades off the amount of within-group homogeneity against the amount of between-group heterogeneity in a cost effective way. We must keep these criteria in mind when evaluating the quality of any segmentation analyses.
Direct Segmentation Methods We present traditional, direct approaches to creating non-overlapping segments here as they are easy to understand, both conceptually and empirically, and are most widely used. As we have stressed previously, there is no single best segmentation. Nor will the methods we outline here agree on the number and composition of segments. Hence, it is critical to use multiple approaches and cross validate any findings; strong segments emerge regardless of the method and analytical choices while weaker relationships will occur only with specific methods. Cross validation, using several methods, will help management determine what relationships and segments are the most substantive. Ideally, management should agree on the criteria for segment development and evaluation before the analysis begins to ensure that the process is most effective and actionable for the organization. There are five steps involved in applying direct segmentation methods: 1. Reducing the data 2. Developing measures of association 3. Identifying and removing outliers 4. Forming segments 5. Profiling segments and interpreting results
T02
Direct Market Segmentation Using Customer Needs
18
Exhibit 5: Similarity data for “essential features” data: firms A and B match on 6 of their essential feature needs (Y-Y or N-N) out of 8 possible matches.
To prepare for the direct segmentation task, we must assemble the customer data into two data matrices where each row represents information about a particular customer. One data matrix will contain the segmentation basis variables and the other contains the descriptor variables. We use the basis variable data matrix for steps 1-4 and the descriptor data matrix for step 5.
1. Reducing the Data with Factor Analysis Many segmentation studies collect data on a wide variety of demand and needsbased items. Often many of the items measure similar or interrelated constructs. In subsequent analyses this may lead to misleading conclusions because some data are overweighted and other data are underweighted. It is therefore important to drop irrelevant variables from the study, since including even a couple of irrelevant variables can inhibit the detection of the segment structure in the data. Factor analysis is one of several methods to reduce a large set of segmentation variables to a smaller set of independent indicator constructs. Specifically, this technique analyzes the interrelationships among a large number of segmentation basis variables and then represents them in terms of common, underlying dimensions (factors). (Appendix 1).
T02
Direct Market Segmentation Using Customer Needs
19
2. Developing Measures of Association Cluster analysis routines require the analyst to define a measure of similarity for every pair of respondents. Similarity measures fall into two categories, depending on the type of data that are available. For scaled data (e.g., How much do you agree or disagree with…) use distance-type measures. For nominal data (e.g., Feature X required/not required) use matching-type measures. Exhibit 5 illustrates the basic idea, and the appendix provides examples.
3. Identifying and Removing Outliers In most data collection exercises, there will be a few respondents whose answers are quite different from all the others. We refer to these respondents as ‘outliers.’ There are two general reasons for outliers to occur. The first is associated with data errors. The respondent may have misunderstood a question or may have put the answer in the wrong place, or coding or transcription errors may have occurred. In any case, the data are incorrect, and the response should either be corrected or the observation removed from the data set. The second reason for an outlier is that a respondent really has needs that are quite different from those of the rest of the sample. These respondents are of two sorts — they may represent an uninteresting, unique set of needs or their needs may represent those of an emerging new segment. You will want to remove both sets of responses from the data set, but while you might ignore or discard the uninteresting responses, you will want to carefully analyze the potential lead users for early indications of where segments might migrate in the near future. It is best to search for outliers and remove them early in the analysis process, as retaining them leads to unstable and unreliable segmentation results.
4. Forming Segments Cluster analysis is a set of techniques for discovering structure (groupings) within a complex body of data, such as the segmentation-basis data matrix. To understand the process of segmentation, think about a deck of cards. Each card
T02
Direct Market Segmentation Using Customer Needs
20
varies from the other cards along three dimensions (variables): suit, color and number. If you are asked to partition a pack of cards into two distinct groups, you might sort them into red and black, or into numbered cards and picture cards. If you are asked to sort them into three groups, it is not intuitively obvious what these might be. A four-group solution is again easy - just sort them into suits. While you can partition a pack of cards intuitively, partitioning a large number of items into groups can be very complex, especially if those items vary along many different dimensions. Consider partitioning 25 items (or respondents in our case) 24 into two groups, with at least one item in a group. There are 2 -1 (= 16,777,215)
possible partitions (market segments). In partitioning 25 items into five groups, 15 the number of possibilities is 2,436,684,974,110,751 (2.43 x 10 ). Clearly we
need a systematic and feasible method of finding the best partition. Cluster analysis (also known as numerical taxonomy or partitioning) represents a set of statistical techniques developed to address this problem. The input to cluster analysis is the set of distances or measures of association discussed above and in Appendix 2. There are two basic classes of clustering methods: •
Hierarchical methods which build up or break down the data row by row, and
•
Partitioning methods which break the data into a pre-specified number of groups and then reallocate or swap data to improve some statistical measure of fit (i.e., the ratio of the within-group to between-group variation).
Our software includes one method of each type - Ward’s (1963) (hierarchical) and K-means (partitioning).
Hierarchical methods themselves fall into two categories: build-up (agglomerative) methods and split-down (divisive) methods. Each type will
T02
Direct Market Segmentation Using Customer Needs
21
produce a tree like that shown in Exhibit 6 – formally called a dendogram – to help identify the clusters. Agglomerative methods generally follow this procedure: 1.
At the beginning, each item is considered to be its own cluster.
2.
The routine then joins the two items that are closest on some chosen measure of distance.
3.
It then joins the next two closest objects, either joining two items to form another group or attaching an item to the existing cluster.
4.
Return to step 3 until all items are clustered.
Exhibit 6: This distance matrix yields one dendogram for single linkage clustering (solid line) and another for complete linkage clustering (dotted line). The cluster or segments formed by companies 1 and 2 join with the segment formed by companies 3 and 4 at a much higher level in complete linkage (3.42) than in single linkage (1.81). In both cases company 5 appears to be different from the other companies, an outlier. A twocluster solution will have A=5, B=(1,2) and C=(3,4).
Agglomerative methods differ in how they join clusters to one another: In single linkage clustering (also called the nearest neighbor method), the routine considers the distance between clusters to be the distance between the two closest
T02
Direct Market Segmentation Using Customer Needs
22
items in those clusters. Single linkage clustering is most often used to identify outliers. In complete linkage clustering (also called the furthest neighbor method) the routine considers the distance between two clusters to be the distance between the pair of items in those clusters that are farthest apart so that all items in the new cluster formed by joining these two clusters are no farther than some maximal distance apart. In average linkage clustering, the routine considers the distance between two clusters A and B to be the average distance between all pairs of items in the clusters, where one of the items in the pair is from cluster A and the other is from cluster B. In centroid clustering the distance between two clusters is typically the Euclidean distance between their centroids (or means).
Ward's method, one of the two methods included in our software, forms clusters based on the change in the error sum of squares associated with joining any pair of clusters (Appendix 3).
Partitioning methods are used most when the analyst has a big data set. These methods are computationally efficient and their output is much easier to interpret when many items (such as 50 or more customers) are being clustered. Unlike hierarchical methods, they do not require the allocation of an item to a cluster irrevocably – that is, the routine will reallocate it if this will improve the statistical fit of the solution. These methods do not develop a tree-like structure; rather they start with cluster centers and assign those individuals closest to each cluster center to that cluster. The most commonly used partitioning method is K-means clustering. The procedure works as follows: 1. The analyst specifies the number of clusters (n).
T02
Direct Market Segmentation Using Customer Needs
23
2. The routine begins with n (analyst-specified) starting points and allocates every item to its nearest cluster center. 3. It then reallocates items one at a time to reduce the sum of internal cluster variability until it has minimized the fit criterion (the sum of the withincluster-sums of squares) for n clusters. 4. After completing step 3, you may return to step 1 and repeat the procedure with a different number of clusters. The solution to K-means clustering is sensitive to the selection of starting points in step 2 above; we recommend using the cluster centroids from Ward’s procedure to give good starting points (the procedure we use in our software). Management judgment, incorporating the goals of the segmentation, can help guide the selection of the appropriate number of segments.
5. Profiling Segments and Interpreting Results After forming segments by following one of the above methods, you need to interpret the results and link them to managerial actions. This is a critical activity because the targeting and positioning decisions depend on the segments you choose to retain. You must address at least the following issues: • Are there really any distinct clusters? • How many clusters should you retain? • How good (interpretable) and robust (stable) are the clusters? • How should the clusters be profiled?
What if there really are no clusters? Don’t overlook this possibility. If only one or two basis variables show meaningful differences between respondents, it is possible that no really distinct segments exist in the market. This could be the result of your selecting a poor set of segmentation bases or perhaps because customer needs in your sample really don’t differ too much.
T02
Direct Market Segmentation Using Customer Needs
24
If the revealed segment structure is weak, then you should build up a rich picture of likely customers, profiling them with the descriptor variables and any information obtained by exploratory research methods.
How many clusters should you retain? This decision involves both art (the purpose of the study) and science (statistical criteria). It is useful to generate a number of potential segmentation schemes and, using statistical criteria, identify the best two or three of these. Then the managers or users can help decide which of these remaining schemes will be most useful.
How good are your clusters? That is, how well do the clusters obtained from this particular sample of individuals generalize to the market as a whole? Too few segmentation studies try to answer this question. Even if the sample is representative, there may be measurement problems, or the analyst may have made some poor choices. Good cluster solutions are usually robust — that is, they are generally stable across methods of segmentation and across measures of association. A good way to check how robust the clusters are is to use different methods (or different measures of association) and then do a cross-tabulation to see how many respondents get assigned to the same groups by the different methods (or different association measures) – the higher the percentage, the better. Another way to test robustness is to split the data randomly into two groups, run the segmentation analysis separately on each half of the data and then compare results. Again, the higher the level of agreement the better. If different approaches assign less than 70% or so of cases to the same groups, you should view the segmentation structure with caution. A second aspect of segmentation quality relates to managerial usefulness: do the users (salespeople, call center personnel, advertising managers) find the results valuable? One way to help understand managerial usefulness is to see if managers can invent intuitively appealing names for the segments. Normally, one looks at cluster means — the average values of the basis and descriptor variables for each
T02
Direct Market Segmentation Using Customer Needs
25
segment — and sees if one can characterize the segment members. For example, active investors need much more timely information about their share portfolios than do the “invest and forgets.” They will probably also be the most receptive customers to electronic trading schemes. The idea behind cluster profiling is to prepare a picture of the revealed clusters based on both the variables used for the clustering (the segmentation bases) and those used to identify and target the segments (the descriptors). The most direct approach to profiling is to compare the average values of the bases and the descriptor variables in each cluster. Exhibit 7 is a snake chart, based on the data in Exhibit 4. One segment concerned with buying PCs has a high relative need for power, color, storage and peripherals and is not price sensitive (budget). We have labeled this the design segment. The other segment, the business segment, is more interested in office use, local area networks (LAN) and wide area connectivity. This group is quite price sensitive. Profiling of the other (descriptor) variables in Exhibit 4 shows that the design segment includes primarily design engineers from smaller firms. A more formal approach to profiling includes a technique called discriminant
analysis, which seeks combinations of descriptor variables that best separate the clusters or segments (McLachlan, 1992).
Exhibit 7: Segment profiles (snake chart) for two segments based on the data from Exhibit 4. T02
Direct Market Segmentation Using Customer Needs
26
Typical Process of Direct Segment Formation Customer Value Data Remove Outliers Ward’s Cluster Analysis Randomly Split Data D1 D2 K-means Clustering Iterate until stable cluster solution(s) found Cross-validate Cluster Solutions Combine D1 and D2 K-means Clustering of Stable Solution(s) Discriminant Analysis to Profile Segments Bring the Segments to Life
Exhibit 8: Typical Process of Direct Segment Formation. Exhibit 8 outlines the main steps of how the process we have outlined so far might work. There will be many false starts here, but that diagram should provide a good, rough benchmark.
T02
Direct Market Segmentation Using Customer Needs
27
REFERENCES Dillon, William R., and Goldstein, Matthew, 1984, Multivariate Analysis:
Methods and Applications, John Wiley and Sons, New York, pp. 173-174. McLachlan, Geoffrey J., 1992, Discriminant Analysis and Statistical Pattern
Recognition, John Wiley and Sons, New York. Ward, J., 1963, "Hierarchical grouping to optimize an objective function," Journal
of the American Statistical Association Vol. 58, pp. 236-244. Wedel, Michel, and Kamakura, Wagner, 2000, Market Segmentation, Conceptual
and Methodological Foundations, second edition, Kluwer Academic Press, Boston, MA. Wind, Yoram J., 1978, "Issues and advances in segmentation research," Journal of
Marketing Research, Vol. 15, No. 3 (August), pp. 317-337.
T02
Direct Market Segmentation Using Customer Needs
28
Appendix 1: Factor Analysis for Preprocessing Segmentation Data Segmentation studies often rely on measurements (observations) about individuals on a number of attributes (variables), however, correlated variables may mask the true segment structure. To address this problem we can use factor analysis to preprocess segmentation data before using cluster analysis. The objective is to reduce the data from a large number of correlated variables to a much smaller set of independent underlying factors, with this reduced set retaining most of the information contained in the original data. The derived factors not only represent the original data parsimoniously, they often result in more reliable segments when used in cluster analysis procedures. The form of factor analysis that is useful for segmentation studies is slightly different from the approach described in this chapter. Let X be a m x n data matrix consisting of needs (or attitudinal) data from m respondents on n variables. The input data are standardized. Let Xs represent the standardized data matrix. We work with unstandardized factor scores, where we denote the factor score matrix as P: (A1.1) (A1.2) where xi = ith column from the standardized data matrix Xs; xki is the element in the kth row and ith column of this matrix; Pj = the jth column of the factor score matrix representing the scores of each respondent on factor j; P=[P1, P2, ... , Pr] is the factor-score matrix with r retained factors; and u = “loadings” that characterize how the original variables are related to the factors.
T02
Direct Market Segmentation Using Customer Needs
29
We seek r factors to represent the original data, where r is smaller than n, the number of variables we started with. If we can pick an r that is less than 1/3 of n, but where the retained factors account for more than 2/3 of the variance in the data, we can then consider the preprocessing of the data to be successful. There is, however, always a danger that some important information is lost by preprocessing sample data in a way that masks the true cluster structure. Thus, it is often a good idea to run the model with and without preprocessing of the data through factor analysis, to see which set of results make the most sense. To aid interpretability of the factors, we can orthogonally rotate the initial factor solution (Varimax rotation) so that each original variable is correlated most closely with just a few factors, preferably with just one factor. We can then use the factor-score matrix with r factors as the set of input variables for identifying segments through cluster analysis. By using unstandardized factor scores at this stage, we can determine during cluster analysis whether to standardize the factor scores, an option that we can select within the cluster analysis software.
T02
Direct Market Segmentation Using Customer Needs
30
Appendix 2: Similarity Measure Issues and Examples Matching coefficients are appropriate when the data are nominal (feature X is required/not required). The following example illustrates how they are constructed. Example: We ask respondents from four organizations that are purchasing a photocopier to state which of its eight features (F) are essential, (F1 = sorting, F2 = color, etc.) with the following result: Essential Features? (Yes or No) F1
F2
F3
F4
F5
F6
F7
F8
Organization A
Y
Y
N
N
Y
Y
Y
Y
Organization B
N
Y
N
N
N
Y
Y
Y
Organization C
Y
N
Y
Y
Y
N
N
N
Organization D
Y
N
N
N
Y
Y
Y
Y
We can define one similarity measure (among the organizations across these eight features) – a similarity coefficient – as Similarity coefficient = number of matches/total possible matches (0.8). The resulting associations are shown in Exhibit A2.1. Organization
Organization
A
B
C
A
1
B
6/8
1
C
2/8
0/8
1
D
7/8
5/8
3/8
D
1
Exhibit A2.1: Similarity data for “essential features” data: firms A and B match on 6 of their essential feature needs (Y-Y or N-N) out of 8 possible matches. T02
Direct Market Segmentation Using Customer Needs
31
Researchers develop other types of matching coefficients in a similar fashion, often weighting differences between positive and negative matches differently. For example, suppose we counted only the number of positive (Yes-Yes) matches in the above table; in that case there would still be a possibility of eight matches, but organizations A and B would have only four of those possible eight matches (4/8) instead of the six (6/8) shown. When the data have interval-scale properties (that is when differences between data points – like age differences and income differences – have meaning) we usually construct some form of distance measure. Distance type measures fall into two categories: measures of similarity or measures of dissimilarity, where the most common measure of similarity is the correlation coefficient and the most common measure of dissimilarity is the (Euclidean) distance. Two common distance measures are defined as follows: •
Euclidean distance = ( x 1i- x 1j) 2+ . . .+ ( x ni- x nj) 2
(A2.1)
where i and j represent a pair of observations, xki = value of observation i on the kth variable and 1 to n are the variables. •
Absolute distance (city-block metric) = x 1i - x 1j + . . . +
(A2.2)
x ni - x nj
where | | means absolute distance. All distance measures are problematic if the scales are not comparable, as the following example shows. Example: Consider three individuals with the following characteristics: Income ($thousands)
T02
Age (years)
Individual A
34
27
Individual B
23
34
Individual C
55
38
Direct Market Segmentation Using Customer Needs
32
Straightforward calculation of Euclidean distances across these two characteristics gives
dAB = 13.0, dAC = 23.7, and dBC = 32.2. However, if age is measured in months, rather than years, we get
dAB = 84.7; dAC = 133.6 and dBC = 57.6. In other words, when we use months, individuals B and C are closest together; when we use years, they are farthest apart. To avoid this scaling problem, many analysts normalize their data (divide it by its standard deviation) before doing the distance calculation. This allows them to weight all variables equally in computing the distance in equation (A2.1). It is, however, important not to standardize the data in some cases; for example, if the segmentation is being done on needs data obtained by such procedures as conjoint analysis, the values of all the variables are already being measured on a common metric. A frequently used measure of association is the correlation coefficient
X1, . . . Xn = Data from organization x, Y1, . . . Yn = Data from organization y; x i = X i - X , , y i = Y i - Y difference from mean value, X and- Y )
then, r xy =
x 1y 1 + . . . + x ny n 2 2 2 2 2 (x 2 1 + x 2 + . . + x n) ( y 1 + y 2 + . . + y n)
(A2.3)
Warning: The correlation coefficient incorporates normalization in its formula. However, it also removes the scale effect. So an individual who gives uniformly high ratings (7s on a 1 to 7 scale) on all items would be perfectly correlated (r = 1) with two other individuals, one who also gave all high ratings and another who gave all low ratings (all 1s on a 1 to 7 scale)! For this reason, we feel that, while correlation coefficients are commonly used in segmentation studies, the results of such studies should be carefully scrutinized.
T02
Direct Market Segmentation Using Customer Needs
33
We recommend that if you have scaled data, you standardize that data first (subtract its mean and divide by its standard deviation), and use a Euclidean distance measure.
T02
Direct Market Segmentation Using Customer Needs
34
Appendix 3: An Illustration of Wards Method for Clustering Drawn from Dillon and Goldstein (1984). Suppose that we have five customers and we have measurements on only one characteristic: intention to purchase on a 1 to 15 scale: Customer
Intention to purchase
A
2
B
5
C
9
D
10
E
15
Using Ward’s (1963) procedure, the clusters are formed based on minimizing the loss of information associated with grouping individuals into clusters. Loss of information is measured by summing the squared deviations of every observation from the mean of the cluster to which it is assigned. Ward’s method assigns clusters in an order that minimizes the error sum of squares (ESS) from among all possible assignments, where ESS is defined as ESS =
k
Σ
nj
nj
Σ X ij2 - n1 Σ X ij
j =1 i =1
j i =1
2
,
(A3.1)
where Xij is the intent to purchase score for the ith individual in the jth cluster, k is the number of clusters at each stage, and nj is the number of individuals in the jth cluster. Exhibit A3.1 shows the calculations, and Exhibit A3.2 gives the related dendogram. The ESS is zero at the first stage. At stage 2, the procedure considers all possible clusters of two items; C and D are fused. At the next stage, the routine considers both adding each of the three remaining individuals to the CD cluster and forming all possible pairs of the
T02
Direct Market Segmentation Using Customer Needs
35
three remaining unclustered individuals; A and B are clustered. At the fourth stage, CDE form a cluster. At the final (fifth) stage, all individuals are ultimately clustered.
Exhibit A3.1: Summary calculations for Ward’s ESS (Error Sum of Square) method. Source: Dillon and Goldstein, 1984, p. 174.
Exhibit A3.2: Dendogram for Ward’s ESS method. Source: Dillon and Goldstein, 1984, p. 174.
T02
Direct Market Segmentation Using Customer Needs
36