Quantitative Quality Control from Qualitative Data: Control Charts with Latent Semantic Analysis
Triss Ashton* Assistant Professor The University of Texas – Pan American 1201 West University, Edinburg, TX 78539 e-mail:
[email protected] (956) 665-3353 (956) 665-3367 (fax) * Corresponding Author Nicholas Evangelopoulos, Associate Professor University of North Texas 1155 Union Circle # 311160 Denton, TX 76203-5017 Victor R. Prybutok Regents Professor University of North Texas 1155 Union Circle # 311160 Denton, TX 76203-5017
ABSTRACT Large quantities of data, often referred to as Big Data, are now held by companies. This Big Data includes statements of customer opinion regarding product or service quality in an unstructured textual form. While many tools exist to extract meaningful information from Big Data, automation tools do not exist to monitor the ongoing conceptual content of that data. We use latent semantic analysis to extract concept factors related to service quality categories. Customer comments found in the data that express dissatisfaction are then considered as representing a non-conforming observation in a process. Once factors are extracted, proportions of nonconformities for service quality failure categories are plotted on a control chart. The results are easily interpreted and the approach allows for the quantitative evaluation of customer acceptance of system process improvement initiatives.
Keywords Analysis of text data, big data, control charts, customer satisfaction, latent semantic analysis (LSA), statistical process control (SPC), text mining.
1. INTRODUCTION As a result of growth in the internet and personal communication devices, businesses of all types now inexpensively collect information from customers on a continuous basis. Companies including manufacturers, service providers, retailers, and online businesses are soliciting input from customers about their experience, the products, and services. Increasingly, the collected data consist of open-ended text and includes information that describes the customer’s perceptions about service or product attributes. Fig. 1 shows that this information provides important and powerful feedback on the firm’s offerings. In fact, in a recent survey of data management professionals, 35-percent of firms indicated they were now collecting unstructured data that included textual information (Russom, 2011). Tremendous strides were made in recent years to automate the analysis of unstructured text data. Those efforts primarily focused on technology for decomposing corpora’s with varying latent structural dimensionality into smaller collections based on dimensions (Deerwester et al., 1990), factors (Sidorova et al., 2008), topics (Arora et al., 2012), sentiment (Liu, 2012), or concepts (Shehata et al., 2013). While these latent semantic approaches sort the elements of the corpus into subgroups that possess similar characteristics, these smaller groups of data are not directly consumable in a decision making environment. Generally, in order to have the results of the text analysis process produce an actionable output, subsequent analysis is required to link the data to actions. Complexities in the analysis of unstructured textual data often results in only minimal use of the data. Frequently, the lack of use of text data occurs because the results are not presented in a form consumable by either the business leadership or the engineering communities. For example, in the quality assurance area a plethora of system monitoring tools exists. The path
shown with the dashed arrow in Fig. 1 depicts the needed link for methods to extract meaningful, substantive information from textual data analysis results and generate monitoring tools that complete the feedback loop. Therefore, a firm that is determined to discover the data’s hidden value might be required to commit considerable resources to the effort using traditional manual analysis methods. One proposal (Ashton et al., 2013a), suggests the application of control charts to the solution generated by a text mining algorithm. That work proposed the use of EWMA and pcharts and discussed how the charts tracked different statistics from the solution as well as reviewed different methodological issues with the implementation of control charted text. This work extends and builds upon the agenda posited in Ashton et al. (2013a) by introducing quality control of service quality through customer generated text data. In this study, we propose an approach that uses latent semantic analysis (LSA) to reduce raw, unstructured textual information to a set of latent semantic factors or topics. Broadly speaking, when these topics are derived from a business origin corpus, many will quantitatively describe service quality, product quality, service characteristics the customer prefers, and desired product design characteristics. We consider each of the discovered topics that constitute a category of nonconformity as topics that require further investigation and analysis. These categories of nonconformity include topics that explain sources of customer satisfaction or dissatisfaction with a business process or offering.
Product/ Service
Manufacturer/ Service Provider
Feedback
Text mining of customer comments Customers
Customer comments
Quality monitoring tools
Fig. 1. Product or service quality monitoring through analysis of customer comments.
Presenting text data on control charts is a tool usable by both the manager and engineer. Control charting text data allows management to understand the severity of nonconformity categories, conducting comparison among the categories of nonconformity, and prioritizing the categories for subsequent corrective action. Such charts are valuable to operation managers because they limit the likelihood of ad hoc responses to a few isolated comments. While this control charting text procedure will only be demonstrated in the service quality domain, the procedure potentially works just as well with data in which the customers discuss desired design characteristics. The desired characteristics of design will fall out as discussion topics and the chart parameters document how prevalent the discussion of the characteristic is among the corpus members. Such results could then guide design and redesign efforts, as well as product positioning. One added benefit, when the stage division technique is applied, the resulting control chart allows judging the success or failure of process improvement projects. Staging divides the
control chart based upon project implementation timing. The parameters of the chart are then recomputed based on new observations. When a process improvement project is successful, the process mean will reflect the improvement. The rest of this paper is organized as follows: related work is discussed in Section 2; we review latent semantic analysis in Section 3; we describe the data used in the experimental phase and briefly review LSA’s decomposition of the data in Section 4; we then review the characteristics of the working data that is applied to the SPC process in Section 5; we review control chart alternatives in Section 6; we develop our text data example with control charts in Sections 7 and 8; and we provide the conclusions for our work in Section 9.
2. RELATED WORK AND PROPOSED METHODOLOGY Little related work exists for extending text analytics beyond the decomposition phase. The first attempt at applying text analytic results to a statistical process control tool occurred in Lo (2008). That work proposed a new tool known as Web-complaint Quality Control (WebQC). WebQC examined customer originated comments that were collected through a website message board. The collected data was analyzed with a support vector machine (SVM) which decomposed the corpus into one of four categories or topics including disorder codes, technical problems, dissatisfaction reports, and other. Once the corpus was analyzed by the SVM, all topics that consisted of customer comments that related to complaints and technical problems were combined or pooled into one single data stream and then plotted on a p-chart. Text mined data was applied to the exponentially weighted moving average (EWMA) control chart in Ashton et al. (2013b). Unlike Lo (2008), once the corpus was decomposed, the k extracted topics retained their separate identities and were plotted onto k EWMA charts. The
advantage of this strategy is that each control chart graphically displays the system state regarding a specific disconformity cause. That work was premised on the fact that eigenvector component values are measures of length given by the Euclidean norm || ||
√∑
.
(1)
With this interpretation, the factor loadings generated by LSA were averaged based on a variable sized weekly sample and then loaded to the EWMA chart. While these methods of extending text mining to statistical quality control tools are sound, they have limitations. In Lo (2008), the resulting chart loses its specificity with regard to the cause for an out of control condition due to the pooling. Lo’s control chart will alert management to a problem; however, the specific issue(s) that caused the out of control condition are lost. Ashton et al. (2013b) corrects for that issue; however, the plotted values are the loading values. There are a number of fundamental problems with that method which are described as limitations in Ashton et al. (2013a). These problems are a result of the nature of text based communication and the principle of least effort (Zipf, 1949). Zipf (1949) states that a person attempts to communicate efficiently with the least effort. Consistent with the principle, we find customers of the business enterprise will frequently express an opinion in a terse fashion; yet other customers are extremely elaborate, detailed, and even to the point of being verbose when expressing the same opinion. When the EWMA charting strategy described in Ashton et al. (2013b) encounters these two uniquely styled writers, the first is usually scored with a much lower factor value than the second. Therefore, the two communications each expressing the same level of non-conformity, are charted with dissimilar values. The EWMA method allocates a level of severity to the non-conformity based on how completely the author replicated the list of terms that defined the topic in the analytic solution and not to the degree of true perceived non-
conformance. For example, the terse customer comment, “late deliveries” receives a lower loading value, hence lower plotting point on the control chart, than the verbose (and more complete list wise) customer comment “your logistics processing caused the delivery of the baby dolls I ordered for my daughter’s birthday to be late‼” when the topic defining terms are logistic, process, late, order, and delivery. To address these limitations, we first propose that the factor solutions be kept separate. Then each factor represents a specific issue. Second, since few authors use the entire collection of terms that define a topic, we consider all documents that meet a certain threshold equally representative of the topics definition. Then we discretize the documents factor-loading values with those exceeding some minimum threshold coded as ‘one’ and those below this threshold coded as ‘zero’. We are then able to use the p-chart with each chart representing non-conforming observations for a specific cause. The p-chart method also addresses another limitation of the prior techniques.
As
described by attribution theory (Heider, 1958), a dissatisfied customer may expresses many reasons or justifications for dissatisfaction. In such a situation the customer is often rationalizing their opinion. When this happens, they write about more than one of the latent factors that exist in the corpus. This is a common occurrence in actual textual data sets. During the singular value decomposition phase of LSA, because two factors are present, lower eigenvector component values are computed for the document against each of the factors. Therefore, the presence of the second factor in the document reduces the eigenvector for the first factor and vice-versa. In the EMWA instance, one document is scored against or accounted for in the control chart of each of the disconformity causes however, with low values. Discretizing fixes this limitation and allows
multiple causes for non-conformity that appear in a text to be fully valued in each corresponding control charts. Attribution is a multifaceted issue. In the data used in the subsequent analysis, customers in 28% of the observations attribute two causes for closing their account, 11.5% cite three causes, 3.2% report four, 1% cites five, and 0.07% cites six causes for account closure. Only 35% of respondents report exactly one reason for closing their account. In total, 8215 customers cited 11,937 justifications for closing their account.
3. LATENT SEMANTIC ANALYSIS In this section, we review only those key equations of LSA required to support the analysis undertaken. For the reader wishing a more detailed overview of LSA, we refer them first to Appendix C of Deerwester et al. (1990), Sidorova et al. (2008), and Manning et al. (2008). For a more extensive discussion on matrix decompositions, see Schott (2005). For a more extensive discussion on LSA base transformations see Hu et al. (2007). LSA was originally introduced as latent semantic indexing for information retrieval queries Deerwester et al, (1990). LSA was subsequently proposed as a theory in psychology and cognitive science (Evangelopoulos 2013) for extracting and representing word meaning via latent concepts Landauer (2007). Sidorova et al. (2008) expanded on this approach introducing matrix rotations in order to extract interpretable factors. In this paper, we build upon the recommendations proposed by Evangelopoulos et al. (2012) for the factor analysis variant of LSA to fit the unique context of the present control chart application. LSA starts with the compilation of a matrix Xt×d, a t×d frequency count matrix where t is the number of terms and d is the number of documents in the collection, known as the vector
space model (Salton, 1975). In our work, the documents are the individual customer comments that originally produce a 4786 × 8215 raw Xt×d matrix. Matrix X is typically reduced in size by eliminating common or low information value words such as prepositions and conjunction coordinators (and, as, at, but, by, for, from, in, nor, of, on, or, so, to, with, and yet), retaining only terms with frequencies greater than one (Deerwester et al, 1990), and then weighted and normalized (Salton and Buckley, 1988). Our final Xt×d matrix after normalization was 256 × 8215. It is then decomposed as
using Singular Value Decomposition
(SVD), the extension of principal components analysis that estimates simultaneous least squares principal components for two sets of variables. Ut×r is a term-by-factor matrix of eigenvectors of XXT, the t×t term covariance matrix, that define r semantic themes in the data. The terms associated by these eigenvectors can be used to interpret or define the latent semantic factors. Σr×r is a diagonal matrix of singular values.
is a factor-by-document matrix of eigenvectors
of XTX, the d×d document covariance matrix, that associates the original documents to the factors. The SVD products are truncated to a reduced space of only the first k singular values, Σ1,…,Σk, such that ̂
is the best rank k least squares estimate of Xt×d. In
the data set used in this study, k = 20 was used for this estimate. The outcomes of interest from SVD are the two loading matrices, LT, a t×k matrix of term loadings, and LD, a d×k matrix of document loadings. Using the orthonormality property VTV = I, where I is the k×k identity matrix, we obtain ̂
.
(2)
Matrix LT associates the various terms with specific factors. This term-factor association
is used to facilitate the factor labeling process. To obtain document factor loadings, we use the orthonormality property UTU = I, where I is the k×k identity matrix, and Σ = ΣT. Postmultiplying both sides by Ut×k gives ̂
.
(3)
Now given any k×k orthonormal factor M, having the property MMT = I, we can obtain rotated document loadings
.
4. EXPERIMENTAL DATA The data used in this analysis comes from a Fortune 500 specialty retailer offering an online service to U.S. customers. Because this service provider is a member of a highly competitive industry, various facts were altered to prevent revealing the provider’s identity. In the discussion that follows, a descriptive word in square brackets [ ] replaces service and product names, e.g. [SKU], to mask the identity of the firm. Certain figures that could disclose the identity of the firm are modified proportionately without altering meaningful ratios. The original data set consists of over 820,000 comments made by customers that had decided to cancel their subscription service with the provider over a 27-month period. As a step in the cancellation process, the company asked the customer why they were cancelling their subscription service and provided an open-ended text box in which the response was recorded. Customers received no reference cues; however, they were forced to provide text in the box in order to complete the cancellation. Because the original data set is exceptionally large, the analysis presented here considered a one-percent random sample. Throughout the 27-months of data collection, the service provider was proactive with the data using traditional analytic techniques, e.g. manually reading, interpreting, and running follow
up queries based on analyst interpretations. The firm also actively introduced strategic initiatives that affected various business processes in response to nonconformities found in the customer comments. These initiatives affect attempts at quality control charting because they change the underlying business process and therefore will be used to establish stage change points in the control charts. We have a list of initiatives that we expect to cause shifts in the mean as reported by the firm in Table 1. Table 1 List of key initiatives Nr Week Initiative 1 1 Started with K1 distribution centers, 2,500 [SKUs] 2 7 Opened K2 new distribution centers 3 9 Increased to 3,000 [SKUs] 4 22 Major software revision: billing, shipping, website all affected 5 24 Increased price 6 26 Increased number of [SKUs] to 4,000 7 38 Opened K3 new distribution centers 8 47 Increased number of [SKUs] to 5,000 9 48 Started shipping [SKUs] from brick-and-mortar stores 10 54 Major software revision: billing, shipping, website all affected 11 58 Increased price 12 68 Introduced lower-priced subscriptions 13 79 Major software revision: billing, shipping, website all affected – This version crashed badly. 14 89 Introduced the ability to put subscription on hold. 15 92 Increased number of [SKUs] to 6,000 16 100 Increased number of [SKUs] to 6,500
After trying a number of alternative factor solutions with LSA, the original customer comments were reduced to a series of 20 factors (k = 20). Determining an optimal number of factors remains an open problem in text mining research (Bradford, 2008). However, the service provider’s feedback was that they found the 20 factors helpful in understanding the customer comments. After SVD, we examined the high loading terms found in the LT matrix and labeled the factors. We had three experts in quality independently review and label (define) each factor. A subsequent meeting among the experts resulted in a labeling consensus. The twenty factors extracted by LSA with definitions, total frequency count, and
percentage of variance explained, are listed in Table 2. When a factor definition, generated in the text mining process, describes a failure in a business process or product, each observation in that factor is considered a fault or nonconformity in that business process. The rest of this analysis focuses on a subset of six factors (j = 6) that describe service quality attributes of a business process. The six quality oriented factors are labeled F3, F6, F7, F11, F14, and F16. They represent 31-percent of cancellations and are the domain of a logistics business process. Of the remaining factors, most are personal reasons for not continuing, e.g. do not use the product enough (F1), moving (F2), or simply prefer the competitor (F9). At first glance, it is tempting to combine factors 6, 7, and 11 into one factor and describe it as “shipping delays”. However, a manual reading of the customer’s comments reveals clearly that the LSA methodology in this instance was able to delineate a difference between these factors.
It separated those dissatisfied with the mail delivery process (F6), from those
disgruntled with delays caused in shipping and receiving area (F7), and from those dissatisfied because of delays in product available (F11). Such differentiation in logistic services is likely helpful for developing appropriate interventions. Factor F3 is a combination of positive and negative sentiments similar to that described in Hu and Liu (2004) and Bollegala et al. (2013).
It requires additional categorizing before
analysis. While analysis through control charting of qualitative data with mixed sentiment is noteworthy, we elected to simplify this demonstration and will leave control charts in bimodal data to future research. This leaves five factors of data F6, F7, F11, F14, and F16, all describing topics of business processes in the logistics function and with singular semantic sentiment for use in this illustration.
5. WORKING DATA FOR CONTROL CHARTS The relationship between the individual customer comments and the factors was established based on the factor loadings in the LD matrix. As one scans the data in the columns of factor loading data found in the LD matrix, the data does not abruptly terminate. At the top of the column are those documents that most closely relate to the factor because of the terms the author (customer) used. As one scans the column data, variation in the mixture of terms used by the author reduces the association of the document to the factor. In the case of weaker loading, the author might be more terse, verbose, or abrupt in writing style about that factor. LSA being a mathematical procedure then associates the document to the factor but at a reduced factor loading when the term usage does not closely match the composition of the factor in the LT matrix. Documents with weaker factor loadings do not share a strong association with the semantic definition of the factor. However, these documents are mathematically associated with the factors because they share a limited number of the factors terms. We consider this noise within the factor. To prevent the injection of noise into the analysis, a minimal threshold value was established and only those documents that exceeded the threshold are considered high loading and were included in subsequent analytic steps. Initially, threshold values θj for factor loading components were determined empirically for each factor j by identifying the level at which comments stopped being relevant to the factor. Because of the small variation in the threshold values across all the factors in our data set, a common threshold value, θD, equal to 0.2702 was used. This value represents the point where, on average, each document loads on exactly one factor.
Table 2 Factors resulting from LSA based text mining
Factor F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20
Label I do not use enough [SKUs] I am moving The service (mixed comments) Prefer [service] from store Do not use enough to make it worthwhile Wait time too long for [SKU] to arrive (mail) Delays in shipping and receiving Duplicate, fraud, or unused account Prefer competitor I do not have enough time to use Product availability waiting time is too long The service is great (with sub reason) Will return after an extended absence The customer wants a more extensive selection Trial membership ending, not continuing Received damaged products Just wanted to try the service Costs too much. Budget constrains Price increases. Want lower prices Sign-up issues. (Accidental sign-up, did not sign-up, will sign up later)
Total Note: Quality factors are in bold.
Frequency Count 770 390 475 573 517 470 545 444 452 330 415 412 442 310 279 317 312 257 254
Percent Variance Explained 9.86 6.91 6.72 5.92 5.87 5.63 5.61 5.48 5.07 4.97 4.66 4.39 4.16 3.79 3.78 3.67 3.64 3.41 3.25
251 8222
3.19 100
For an analysis of the distribution of the LD matrix, we convert the LD matrix to a discrete form by considering the comments whose loading exceeds the threshold value θD. When the comment loading value exceeds θD it is recoded as 1 while all comments with loadings that fall below θD are coded 0. Then, we let nj be the count of high-loading comments for factor j, defined using the indicator function (
) that compares the loading of ith comment on the
jth factor (lij) to θD, ∑
∑
∑
(
)
.
(4)
The sum of all high loading comments is fixed to N based on the θD threshold’s value. Then, the observed probability for any given factor j, i.e., the observed probability that a randomly selected
comment has a loading that exceeds threshold θD, is the count of all comments loading on factor j over the total high loadings count N is , where pj ≥ 0, and ∑
(5)
for j = 1,…,k. Each comment may discuss factor j with
probability pj independent of whether or not it discusses factor i with probability pi, when j ≠ i. Therefore, the k factors correspond to k independent Bernoulli processes characterized by the random variables Bj, j = 1,…,k. Table 3 illustrates the structure of our data set. The left panel shows factor loadings for N comments (shown as rows) and k factors (shown as columns). Each comment can load highly on zero, one, more than one, and up to k factors. Looking at Table 3 in a column-wise fashion, the total loadings count for each factor is nj as shown in (3). At the data collection point, had the company required that each customer comment pick 1 out of k available issues, the distribution of customer-reported issues would have been multinomial over k outcomes. However, the current approach allows the issues to occur naturally in the voice of the customer in its original, unstructured text form, leads to a set of k independent Bernoulli variables Bj, j = 1,…,k, representing whether or not each of the k identified concepts is raised by the customer comment at a significant level. Looking at each factor separately allows us to construct a control chart where the process that produces comments that load high on factor j is monitored. The basis for such a control chart is the binary data shown on the right panel of Table 3. The k independent Bernoulli variables shown in Table 3 are not identical because they have different probabilities pj.
Table 3 k independent Bernoulli random variables and their relationship to the factor loadings LSA Factor loadings
Bernoulli outcomes
Comment
F1
F2
…
Fk
Trial
B1
B2
…
Bk
C1
l1,1
l1,2
…
l1,k
1
0
0
…
1
C2
l2,1
l2,2
…
l2,k
2
1
0
…
0
…
…
…
…
…
…
…
…
…
…
CN
lN,1
lN,2
…
lN,k
N
1
1
…
0
High-loading comment count
n1
n2
…
nk
n2
…
Success count
n1
nk P(success)
n1/N
n2/N
…
nk/N
Independence of the processes is evaluated with Pearson’s correlation coefficient. Of the 190 possible correlations, only 4 have |R| ≥ 0.1, and only 13 fall in the range 0.05 ≤ |R| ≤ 0.01. Of those 17 correlations the highest coefficient is 0.268. Therefore, the assumption of independence appears reasonable. The time series nature of the comments allowed us to perform counts of the observations i for each factor j within each time period t (in our case study, one week) that exceed the common threshold θD. The number of high loading documents (customer comments in this application) is ∑ where
(
)
(5)
is an indicator function for the ith comment on factor j during period t that
load greater than or equal to the loading threshold
and nt is the total number of customer
comments sampled in period t. In this case study, the number of factors is k = 20, and the number of periods (weeks) is T = 116. The counting process also included a cumulative count of all high loading documents per time period t across all factors j ∑
∑
(
)
.
(6)
The counts in (5) and (6) are combined to produce the proportions of high-loading documents specifically addressing factor j over all high-loading documents in time period t .
(7)
For a first overview of the working data, Fig. 2 presents a time series plot of the high loading documents Dj,t as defined in (5), for factor j = 6. Fig. 2 shows a high count level in the first six weeks as well as the increased activity in weeks 41, 85, and 95. Throughout this 116week period, the service provider membership was increasing, rendering a traditional time series plot ineffective for presenting the data.
Fig. 3 presents the proportions of high-loading
documents as defined in (7). An overall decreasing trend is observable in Fig. 3 which was not present in Fig. 2. Also note that the increased activity now appears in weeks 41, 58, and 88 when we consider the proportions of high-loading documents. While for some other applications the total counts would be valuable in this case we focused on the proportions. This makes more sense in this application because the organization had a growing number of customers over the observed 27 month period. In different business environments it is potentially advantageous to monitor the counts as well as the proportions.
Fig. 2 Time series plot of factor F6 observations
Given the Dj,t and mt variable data, we are now ready to transition our approach from the LSA to SPC issues. In the next section, we show how control charts are used to monitor the factor data.
We also describe how such monitoring allows evaluation of the company’s
initiatives via stage testing.
Fig. 3 Time series plot of factor F6 as a proportion of all observations
6. CHART SELECTION FOR TEXT At this point, the data are fully converted to binomial count data, defined as nonconforming for a specific cause, and loadable to the chart of the researcher’s choice. The p-chart was selected for this illustration for several reasons. First, the objective of the analysis is to demonstrate the application of control charts to the monitoring of text-mined customer comment data and a basic p-chart allows such demonstration without the complication of having to explain and develop a more complex control chart. Second, the p-chart is a fundamental control chart for examining this data and is well understood by many professionals from a variety of specializations. While the np-chart would have performed equally well to the p-chart, the c-chart and u-charts are inappropriate in this study because we consider each observation a nonconformity and we are not counting the nonconformities per unit. An added advantage of the p-chart is its straightforward
interpretation – it shows variations as a fraction of nonconforming output (Duncan, 1986). From a practitioner’s perspective, the chart is easy to construct, easy to interpret, and well understood (Montgomery, 1996). One limitation of the p-chart is that a sufficient sample size n is required to avoid alarming on a single or small number of nonconformities if
̂ , the actual process value, is low
(Duncan,1986). One strategy for avoiding small values of n is to adjust the time frame used for taking samples. Our data set, like many customer comment data sets, was exceptionally large and did not present a sample size problem. In fact, our data set was sufficiently large (>820,000) to allow sampling on a daily basis. However, in most business organizations, performing this analysis daily is not recommended. Daily review may not reveal central themes and as a result can miss changes that are strategic in nature.
However, monthly processing is also not
recommended because such a time period may delay detecting shifts caused by competitor actions. For example, while a customer may be satisfied with your delivery time today, they may be even more satisfied with a competitor that suddenly cuts the delivery time. When compared with machine processes, customer sentiment typically develops over longer time periods.
Finally, the analytic processing may require significant commitment of human
resources. Skilled staffing was self-reported as a barrier to conducting big data analytics by 46percent of the respondents in a recent survey (Russom, 2011).
For these reasons, we
experimented with different alternatives and chose to use weekly interval sampling because it is a convenient natural time period that fits the process change timing concerns. The weekly sampling plan worked out nicely resulting in samples of 30 to 60 observations per factor in the early period of the analysis. The number of observations per factor increased to a range of 100 to 190 per week in the latter time periods because of the increase in subscribership. While an
interval sampling plan could be implemented, with samples of size n taken at specified times t, the chart also works well with variable sample sizes. This aspect of p-charts allows us to automate processes and, if desired, implement a 100-percent sampling plan. Since each factor defines a different issue with a business process, we plot the data of each factor separately. The p-chart uses as a base line the number of nonconforming units Dj,t summed across all factors j, divided by the sample size mt such that (Montgomery, 1996) ̅
∑
⁄
.
(8)
In many settings, e.g. service quality, data populated on control charts will not have predetermined objective measurements and the researcher or management has to establish a standard acceptable level of p. In this research, we use the mean of the data within each stage as the centerline. For control limits, we use the three-sigma standard given by (Montgomery, 1996) ̅
√
̅
̅
.
(9)
Because we use stages in the charts, shifts in the mean are tested under the hypothesis and
using the z statistic ̂ – ̂ ⁄√ ̂
where ̂
̂
̂
̂
⁄
⁄
(10)
(Montgomery, 1996). We evaluate results at a 5-percent
significance level given by any standard Z-table Z0.025 = 1.96.
7.
P-CHARTS AND STAGE TESTING
The quality related factors in Table 2 (F6, F7, F11, F14 and F16) each describe a specific quality issue found within the customer comments. Three of those five factors involve time. The time oriented factors are the waiting time is too long for mailed products (F6), delays in shipping and handling process (F7), and availability wait time for preferred products is too long (F11). Logistics activities provide the place and time utility (Stock and Lambert, 2001) and each of these factors could be considered a nonconformity category in time utility. During data collection, the company experienced several strategic changes and process changes that altered the mix of the logistics system and as a result, could potentially affected the time utility and more specifically the customer service level. Of the items in Table 1, six changes have the potential to shift the mean within these time oriented factors. These six changes include the opening of additional distributions centers in weeks 7 and 38. These actions pushed supply closer to the customer and theoretically reduced the transportation time. The initiation of shipping from local brick and mortar sites in week 48 also pushed supply closer to the customer reducing transportation time.
However, this initiative could
possibly conflict with the store employee’s core competency. The software revisions in weeks 22, 54, and 79 were key initiatives that affected all facets of the business. If a shift in the mean occurs, then these events also signify a stage change in the p-charts. Process changes create a potentially uncomfortable experience and are often associated with a startup learning curve. Furthermore, intentionally undertaking a change for improvement requires a well-defined process (Evans and Lindsey, 2008).
Therefore, there is a possibility that changes in the
distribution options could potentially increase the process mean if the implementation went badly or if the change were poorly received.
Fig. 4 presents the final staged p-charts for factors 6, 7, and 11. All computations associated with these factors are in Table 4. During stage testing, the initiation of product
Fig. 4. p-charts of factors F6, F7, and F11 with associated stage shifts
shipments from stores (week 48) and the second software roll out (week 54) were not significant at α = 0.05 level for either factor 6 (mail wait too long), or factor 11 (product availability too long). Since the initiation of shipments from local stores in addition to the existing distribution centers reduced the distance from product to customer, this process change was expected to reduce customer departures because of mail delivery delays. The failure to reduce the departure rate due to delays was surprising but potentially attributable to such issues as the learning curve at the local store, conflicts in priorities at the local store, and possible routing delays with local mail.
Stage testing in factor 7 (delays in shipping and receiving) found that the software deployment (week 22) and initiation of shipments from stores (week 48) were not significantly different with respect to shipping and handling process delays. For factor 7, the technology applied did not reduce the process mean and customers continued to depart because of delays in shipping and receiving at the same rate. While technology improvements, such as the major software deployment in week 22, are typically expected to improve the efficiency of a system, the system efficiency is limited by its weakest link. Table 4 Parameter estimates and z-testing of stages for factors 6, 7, and 11 Stage Week Factor 6 1 1 2 7 3 22 4 38 5 48 4Rev 38 6 54 4Rev2 38 7 79 Factor 7 1 1 2 7 3 22 2Rev 7 4 38 5 48 4Rev 38 6 54 7 79 Factor 11 1 1 2 7 3 22 4 38 5 48 4Rev 38 6 54 4Rev2 38 7 79
̅
UCL
LCL
0.161290 0.107649 0.053333 0.082452 0.060729 0.075000 0.065588 0.069285 0.037685
0.3087 0.2626 0.1550 0.1991 0.2171 0.2474 0.1667 0.1730 0.0863
0.0138 0 0 0 0 0 0 0 0
0.122581 0.066572 0.052222 0.058531 0.103594 0.097166 0.101389 0.078167 0.056528
0.2541 0.1912 0.1528 0.1647 0.2329 0.2911 0.2990 0.1878 0.1155
0 0 0 0 0 0 0 0 0
0.138710 0.073654 0.037778 0.057082 0.072874 0.062500 0.067385 0.065466 0.037236
0.2773 0.2043 0.1240 0.1555 0.2430 0.2210 0.1697 0.1664 0.0856
0 0 0 0 0 0 0 0 0
n1
n2
310 706 900 473
706 900 473 247 720 1113 1833 4458
720 1833
310 706 1606 473 720 1113
310 706 900 473 720 1833
̂
Z
Conclusion
0.124016 0.07721 0.063365 0.075
2.388644 4.047477 -2.10467 1.050618
Reject Reject Reject FTR
0.06368
-0.41614
FTR
0.048557
6.256838
Reject
706 900 1606 473 247 720 1113 4458
0.083661 0.058531
2.968923 1.215931
Reject FTR
0.068783 0.101389
-3.40358 0.271277
Reject FTR
0.087289 0.060851
1.720214 2.701458
Reject Reject
706 900 473 247 720 1113 1833 4458
0.093504 0.053549 0.044428 0.0625
3.279601 3.169836 -1.64975 -0.83105
Reject Reject Reject FTR
0.065466
-0.41298
FTR
0.045462
4.88408
Reject
In all three charts, only three violations of the upper control limit are present. In the F6 chart (Fig. 4), weeks 86 and 87 require evaluation by management. These out of control states are potentially associated with the software revision initiated in week 79. That revision, installed seven weeks prior, initially crashed badly. We would not expect customers to respond instantly to a software failure that impacted distribution but to become irritated over time. This software change affected all facets of the business and possibly caused the out of control states observed in weeks 86 and 87. The opening of new distribution centers in week 38 also went badly. Those openings increased by 50-percent the number of distribution centers, after which, the mean of all three factors increased. Nevertheless, even though weeks 86 and 87 were out of control, the mean of the stage initiated at week 79 was the lowest for any stage with respect to mail delays (factor 6) and product availability delays (factor 11). This supports the contention that the initiatives under taken by the firm reduced the nonconformities. Factor 14 (Fig. 5) observations represent nonconformities because these customers wanted a more extensive product selection than was currently offered. While product mix is a marketing issue, an increase in product SKU’s carried by the company directly increases the workloads in the logistics system. There were several occasions during the data collection period that the company expanded their product offerings. While these increases potentially reduced the frequency of factor 14 nonconformities, it could also result in nonconformities in other areas of the logistics system. The increases in product offerings occurred in weeks 9, 26, 47, 92, and 100. Parameter estimates and z-test data are in Table 5 while Fig. 5 has the final chart of the data. Stage testing found no significant difference in the means of data until the increase in offerings in week 92. Interesting enough, at that time the mean of the factor shifted up. Since the company was actively increasing offerings all along, the expectation was that the
mean would shift down. Observations in this chart are possibly best explained by the actions of a competitor although the information needed to support that hypothesis is not available. The chart signals out of control conditions only twice at weeks 90 and 91 just before the mean shifted up. While not out of control, four points within the first stage are worth further consideration. These are weeks 41, 53, 54, and 85. At various points along the data collection period, the company’s marketing department was offering free trial subscriptions through nationwide marketing campaigns. We believe these four observations are a result of a sudden surge in subscriptions followed by cancellations extending from the experience.
Fig. 5. Final p-chart of factor F14 after stage testing
Factor 16 reflects customers departing because they received damaged products. For example, changes in operator procedures, training of personnel, or changes in packaging used in shipments can influence the amount of damages product delivered. Presentation of this data is through a single stage chart format using the factor grand mean ̅ = 0.0386256 as the centerline (Fig. 6). The chart signals only one out of control condition when in week 28 it violates the upper control limit, but 4 different observations in weeks 42, 43, 44, and 50 exceeded nine points in a row on the same side of the line. In addition, the tail area beyond week 85, which visually
Table 5 Parameter estimates and z-testing of factor 14 Stage Factor 6 1 2 1Rev1 3 1Rev2 4 1Rev3 5 6 5Rev1
Week 1 9 1 26 1 47 1 92 100 92
̅ 0.023419 0.033069 0.029586 0.021626 0.025652 0.032712 0.029075 0.046473 0.049553 0.048541
UCL
LCL
0.0830 0.1279 0.1194 0.0976 0.1082 0.0850 0.0785 0.0964 0.1050 0.1034
0 0 0 0 0 0 0 0 0 0
n1
n2
427
756 1183 1156 2339 2201 4540 1205 2462 3667
1183 2339 4540 1205
̂
Z
Conclusion
0.029586
-0.94074
FTR
0.025652
1.217375
FTR
0.029075
-1.41506
FTR
0.032724 0.048541
-3.01766 -0.40767
Reject FTR
appears to have a tighter standard deviation the cause of which we cannot determine. This pattern occurs in all the charts and might suggest stability in the system.
Fig. 6. p-chart of factor F16 (customers receiving damaged products)
8. EWMA CHARTS AND TEXT DATA The p-chart has one added disadvantage we have not previously address – it detects large shifts well but not small shifts. In the instance of the data set at hand, this has not been an issue because the full data set consists of over 820,000 observations. In other settings, the data set may
not be so robust meaning the p-chart could be disadvantaged. For this study, we also illustrate the use of the EWMA charts in text data; however, recall from section 2 that a more comprehensive
A
B
Fig. 7. Factor F7 (Delays in shipping and receiving). Panel A: EWMA with λ ≤ 0.10 and L=2.814. Panel B: p-chart
application of the EWMA in text data is available in Ashton et al. (2013). The EWMA chart preforms relatively better in detecting small shifts, particularly when the values of λ are small (Montgomery, 1996). In many instance the EWMA is problematic in text data given crossloading topics (attribution) and terse text tendencies. For the EWMA chart demonstrated here, we set the value of λ ≤ 0.10, the control limit L at 2.814, and use a moving range of 3 (Lucas and Saccucci, 1990). These values are selected to give us the greatest sensitivity to small shifts with an expected average run length ARL0 ≌ 500 when no shift is present yet a very short run ARL1 ≌ 10.3 when the system goes out of control by 1 standard deviation (Lucas and Saccucci, 1990). For this illustration, only one factor of data, factor F7 is presentation. In this instance, the EWMA presents an environment that is consistent with that presented by the p-chart (see Fig. 7). Factor F7’s p-chart (Panel B) is also provided for convenient comparison.
9. CONCLUSION In this study, we develop and present an approach to evaluate textual data with multiple latent causes via control charts. We achieve that by extracting the concept structure using LSA. In the singular value decomposition step of LSA, we extract factor loadings for the quality related factors, discretize them, and count the number of high-loading customer comments per time frame of interest. These counts are used to populate the control chart. Since the selected factors are associated with a specific cause for nonconformity, the resulting control charts allow evaluation of the company’s efforts to improve its business processes. This study examined data collected during an on-line subscription cancellation process. However, the method developed has broader application. In this study, the customer was required to complete a text box before the cancellation process would complete. Given the development of the internet, personal communications devices, and social networking, business oriented data related to product quality, service quality, perceptions, and offerings it is possible to collect data on a nearly continuous basis. Many such venues use short text responses and rely on compatible technologies such as short message service (SMS), or limits specified by the hosting service (Twitter). As a result of the imposed message limits, instrument development becomes crucial. Cocco and Tuzzi (2012) found a question-wording effect as a result of the method of survey completion. That study also found that the question-order effect increased depending on the survey completion mode. Cocco and Tuzzi’s (2012) work was based on Likertscale instruments, however, their work suggests instrument development designed for text data collection should consider the behavioral norms of the technology user. In recent years, effort has gone into the development of hybrid survey instruments that collect a mixture of Likert-scale and open-ended question data (Fielding et al., 2013). While our current method was developed
with data exclusively generated from open-ended question, applying the control charting technique presented here with the text portion of a hybrid instrument is a potentially valuable approach. Consistent with the text analysis method presented, two business oriented text analytics strategies present themselves. The first strategy depends on a text only response instrument designed consistent with Cocco and Tuzzi’s (2012). As a first step of analysis, apply one of the many sentiment analysis techniques (Feldman, 2013; Scharkow, 2013). Separate the respondents based on the sentiment score into at least three different subsets reflecting positive, neutral, and negative sentiment. Text-mine the three subsets independently thereby uncovering the cause(s) for the opinion. Finally, monitor the results via control chart. The second strategy depends on the development of a hybrid instrument (Fielding et al., 2013). When the data are analyzed, text responses could be compartmentalized into subsets according to the Likert response. The Likert response item essentially functions as a sentiment analysis score and allows the respondents to self-select among categories such as reflecting positive, neutral, and negative. Text mining of these separate subsets then would reveal the cause(s) of the sentiment. Those results could then be monitored by a control chart. If either of these instruments were deployed into a mobile environment, then it is likely the instrument items will need to be consistent with Cocco and Tuzzi’s (2012) approach. Such an instrument could be deployed into the market on a regularly scheduled and predefined period of time generating as many samples as necessary. Selection of a particular control chart for text mined data is situational dependent and compounded by the data structure and the monitoring objectives. In the present research, the customer comments frequently attribute multiple causes for nonconformity in their comments.
Consistent with such data we describe the occurrence of comments that are relevant to nonconformity categories by a number of independent Bernoulli processes. In other applications it is possible that the survey subject or survey questions are potentially narrow in scope and can result in data that consists of exactly one explanation selected out of a set of available nonconformity explanations. In such a case, the process would be multinomial. Alternatively, when the objective is to monitor how many nonconformities occur per comment the data can also be collapsed into counts of nonconformities per comment. In such a case the Poisson binomial distribution would be applicable because it describes the sum of k independent nonidentical Bernoulli processes. This study is an important effort toward opening the possibility of such options for future work. The present work makes an important contribution to the body of literature because it is the first work that takes unstructured textual feedback and establishes a mechanism for transitioning the analysis of that text into actionable SPC. In a traditional analytic approach to textual data, manual content analysis is labor intensive and more subject to interpretation bias than the LSA control chart approach we develop and illustrate in this work. This LSA based approach represents a contribution to manufacturing and service quality because it provides managers and researchers a means to systematically analyze text based data without the bias and labor intensive problems associated with traditional manual text analysis systems.
The use of
raw textual data is also potentially superior to the use of structured survey instruments because in many instances, the extracted factors represent nonconformities that were not previously anticipated and analysis can proceed to root cause analysis. It is also a reasonably efficient process given the power of current desk top computers and likely to become of increasing importance in this era of Big Data and the emerging era of mega data.
REFERENCES Arora, S., Ge, R., Moitra, A.: Learning topic models – going beyond SVD. In Proc. IEEE 53rd Annu. Symp. on Found. of Comput. Sci., New Brunswick, NJ, 1-10 (2012) Ashton, T., Evangelopoulos, N., Prybutok, V.: Extending monitoring methods to textual data: A research agenda. Qual. Quant., (2013a). doi 10.1007/s11135-013-9891-8 Ashton, T., Evangelopoulos, N., Prybutok, V.: Exponentially weighted moving average control charts for monitoring customer service quality comments. Int. J. of Serv. and Stand. 8(3), 230-246 (2013b) doi 10.1504/IJSS.2013.057237 Bollegala, D., Weir, D., Carroll, J.: Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans. Knowl. Data Eng., 25(8), 1719-1731 (2013) Bradford, R.: An empirical study of required dimensionality for large-scale latent semantic indexing applications. CIKM ’08: Proc. of the 17th ACM Conf. on Inf. and Knowl. Manag., ACM, pp. 153-162, New York (2008) Cocco, M., Tuzzi, A.: New data collection modes for surveys: a comparative analysis of the influence of survey mode on question-wording effects. Qual. Quant., 47, 3135–3152 (2013) Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. of the Am. Soc. for Inf. Sci., 41(6), 391-407 (1990) Duncan, A.: Quality control and industrial statistics (5th ed.). pp. 436-451. Irwin, Homewood, IL. (1986) Evangelopoulos, N.: Latent Semantic Analysis. Wiley Interdisc. Rev.: Cogn. Sci., 4(6), 683-692 (2013). doi 10.1002/wcs.1254 Evangelopoulos, N., Zhang, X., Prybutok, V.: Latent semantic analysis: Five methodological recommendations. Eur. J. of Inf. Syst., 21, 70-86 (2012). doi 10.1057/ejis.2010.61 Evans, J., Lindsay, W.: Managing for quality and performance excellence (7th ed.). pp. 462. Thomson/South-Western, Mason, OH, (2008) Feldman, R.: Techniques and applications for sentiment analysis. Commun. of the ACM, 56(4), 82-89 (2013) Fielding, J., Fielding, N., Hughes, G.: Opening up open-ended survey data using qualitative software. Qual. Quant., 47, 3261–3276 (2013) Heider, F.: The psychology of interpersonal relations. New York: John Wiley & Sons (1958) Hu, M., Liu, B.: Mining and summarizing customer reviews. Proc. of Knowl., Discov., and Datamining, pp.168-177. Association for Computing Machinery, Seattle, WA. (2004) Hu, X., Cai, Z., Wiemer-Hastings, P., Graesser, A.C., McNamara, D.S.: Strengths, Limitations, and Extensions of LSA. In: T. K. Landauer, D.S. McNamara, S. Dennis, W. Kintsch, (Eds.), Handbook of latent semantic analysis, pp. 401-425. Lawrence Erlbaum Associates, Mahwah, NJ (2007) Landauer, T.: LSA as a theory of meaning. In: T. K. Landauer, D.S. McNamara, S. Dennis, W. Kintsch, (Eds.), Handbook of Latent Semantic Analysis, pp. 1-32. Lawrence Erlbaum Associates, Mahwah, NJ (2007) Liu, B.: Sentiment Analysis and Opinion Mining. San Rafael, CA: Morgan & Claypool, (2012) Lo, S.: Web service quality control based on text mining using support vector machine. Expert Syst. with Appl., 34, 603-610 (2008) Lucas, J., Saccucci, M.: Exponentially Weighted Moving Average Control Schemes: Properties and Enhancements. Technometrics, 32 (1), 1-12 (1990)
Manning, C., Raghavan, P., Schütze, H.: Matrix decompositions and latent semantic indexing. In: Introduction to Information Retrieval, pp. 403-420. Cambridge University Press, New York, NY (2008) Montgomery, D.: Introduction to statistical quality control (3rd ed.). pp. 252-266. John Wiley & Sons, New York, NY. (1996) Russom, P.: Big data analytics. TDWI Best Practices Report. The Data Warehouse Institute, Reston, Va. [Online]. Available: http://tdwi.org/research/2011/09/best-practices-report-q4big-data-analytics.aspx (2011, Fourth Quarter). Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. and Manag., 24(5), 513-523 (1988) Salton, G.: A Vector Space Model for Automatic Indexing. Commun. of the ACM, 18(11), 613– 620 (1975) Scharkow, M.: Thematic content analysis using supervised machine learning: An empirical evaluation using German online news. Qual. Quant., 47, 761–773 (2013) Schott, J.: Systems of linear equations. In: Matrix analysis for statistics (2nd ed.), pp. 221-254. John Wiley & Sons, Hoboken, NJ (2005) Shehata,S., Karray, F., Kamel, M.S.: (2013) An efficient concept-based mining model for enhancing text clustering. IEEE Trans Knowl. Data Eng., 22 (10), 1360-1371 (2013) Sidorova, A., Evangelopoulos, N., Valacich, J., Ramakrishnan, T.: Uncovering the intellectual core of the information systems discipline. MIS Q., 32(3), 467-A20 (2008) Stock, J., Lambert, D.: Strategic logistics management (4th ed.), pp. 9-29. McGraw-Hill/Irwin , Boston, MA (2001) Szarka, J., Woodall, W.: On the equivalence of the Bernoulli and geometric CUSUM charts. J. of Qual. Technology, 44(1), 54-62 (2012) Zipf, G.K.: Human behavior and the principle of least effort. Addison-Wesley Press, Cambridge, MA (1949)