Exploratory tobit factor analysis for multivariate censored data
This article was downloaded by: [Duke University Libraries]On: 14 August 2012, At: 12:19Publisher: Psychology PressInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK
Multivariate Behavioral Research Publication details, including instructions for authors and subscription information:
Exploratory Tobit Factor Analysis for Multivariate Censored Data Wagner A. Kamakura & Michel Wedel
Version of record first published: 10 Jun 2010
To cite this article: Wagner A. Kamakura & Michel Wedel (2001): Exploratory Tobit Factor Analysis for Multivariate Censored Data, Multivariate Behavioral Research, 36:1, 53-82 To link to this article:
This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden.
The publisher does not give any warranty express or implied or make anyrepresentation that the contents will be complete or accurate or up todate. The accuracy of any instructions, formulae, and drug doses should beindependently verified with primary sources. The publisher shall not be liablefor any loss, actions, claims, proceedings, demand, or costs or damages
whatsoever or howsoever caused arising directly or indirectly in connectionwith or arising out of the use of this material.
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Multivariate Behavioral Research, 36 (1), 53-82
Copyright 2001, Lawrence Erlbaum Associates, Inc.
University of Groningen and University of Michigan
We propose Multivariate Tobit models with a factor structure on the covariance matrix. Such models are particularly useful in the exploratory analysis of multivariate censoreddata and the identification of latent variables from behavioral data. The factor structureprovides a parsimonious representation of the censored data and reduces thedimensionality of the integration required in evaluating the likelihood. In addition, thefactor model parameters lend themselves to substantive interpretation and graphicaldisplay. The models are estimated with simulated maximum likelihood. Applications tothe prescription of pharmaceutical products and the analysis of multi-category buyingbehavior are provided.
Factor analysis has been one of the favored methods for data analysis
among behavioral researchers, and is a fundamental tool forpsychometricians (c.f., Mulaik, 1972). Factor analysis models wereoriginally developed for continuous (normally distributed) observedvariables, but were later extended to binary outcome variables(Bartholomew, 1980), thus allowing behavioral researchers to factor analyzedata with continuous or discrete variables. However, some situations requirethe analysis of data in which each observed variable is a discrete-continuous
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
mixture. This usually happens when analyzing behavioral data. For example,one of the data sets used in the present paper comes from a study pertainingto the prescription of drugs by physicians and contains information onwhether or not a physician prescribed a drug and, if so, the volumeprescribed. A second example that we provide comes from marketing,where purchase volumes of products in multiple categories are analyzed.
For correspondence regarding this article, the first author may be reached at 108 Pappajohn
Business Building, University of Iowa, Iowa City, 52242-1000. The authors are grateful to theeditor and two anonymous reviewers for helpful suggestions and comments.
The data contain information on whether a product was purchased, and if soin what volume. Behavioral data, commonly collected and analyzed in thehuman and social sciences for example in epidemiology (Donovan, 1993),genetics (Waller & Muthén, 1992), health (Mingshan, 1999), psychology(Sher & Wood, 1996), economics (Burket, 1998) and management (Van denBerg & Richardson, 1999), are inherently non-negative and sometimes havea large proportion of zeros. Treating theses zeros as missing data in a factoranalysis will obviously produce a biased representation of the data, becausethe censoring mechanism producing the zeros contains information about thefactor pattern. Moreover, analyzing these data with a standard factor modelproduces biased estimates, given the severe non-normality that results, sothat the standard assumptions on the residuals do not hold (Muthén, 1989).
This type of data with zero observations is sometimes referred to as
mixed type data, having recently attracted much attention in the statisticalliterature. Examples are the work of Sammel, Ryan and Legler (1997),Sammel and Ryan (1996), Fitzmaurice and Laird (1995), Cox and Wermuth(1992), Arminger and Küsters (1988), and Lance, Cornwell and Mulaik(1988). There, mixed type data are modeled through (bivariate) binomial andnormal distributions for the zero and non-zero data values. However,problems in the analysis of mixed type data were already recognized byTobin (1958). The Tobit model provides the advantage of providing anexplicit link between the data-generating mechanism of the zero and non-zero data by offering a variety of specifications of latent variables andcensoring mechanisms and restricting the distribution of the non-censoreddata to have positive support.
Amemiya (1985, Ch. 10) provides a classification of Tobit models
that is based on the form of the likelihood function, the number ofvariables and whether or not each of them drives censoring. The two
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
most common classes of Tobit models are the ones classified by Amemiyaas Type-1 and Type-2 models. The Type 1 Tobit is: y* = xЈ + ε , withy = y* if y* > 0 and y = 0 otherwise, where y* is observed only if it is largerthan zero and ε ~ N(0,2). The Type 2 Tobit is defined as: y * = x Ј + ε ,
y * = x Ј + ε , where only the sign of y * is observed. Further, y = y *is
observed if y * > 0 and y = 0 otherwise. Both models thus explicitly link the
distributions of the zero and nonzero data through the censoring mechanism,with the Type-2 model specifying a different latent variable for the censoringmechanism. The type-2 model is thus the more general of the two and isapplicable in particular in cases where a different data-generating
mechanism is thought to underlie the zero and nonzero data parts. Theincreasing availability of micro-level behavioral data has greatly stimulatedthe interest in these Tobit models in the behavioral sciences (e.g., Amemiya,1985, Chapter 10; Jones & Possnett, 1991; DeSarbo & Choi, 1999).
The estimation schemes that have been proposed for Tobit models
involve two-stage least-squares (Heckman, 1976), nonlinear least-squares(Wales & Woodland, 1980), Markov Chain Monte Carlo Methods (Chib,1992) and maximum likelihood (Amemiya, 1973). Maximizing the Tobitlikelihood offers advantages, since it provides consistent and efficientestimates (cf. Amemiya, 1985, Ch. 10), while for some parameterizations thelikelihood is globally concave in the parameters (Olsen, 1978). The Tobitlikelihood needs to be maximized iteratively using, for example, the Newton-Raphson algorithm. However, for multivariate Tobit models the numericalevaluation of the likelihood has been difficult or even impossible, since itinvolves high dimensional integrals. This seems to have hampered theapplication of Tobit models to high dimensional multivariate behavioral datasubject to censoring. Multivariate Tobit models have been previously limitedin most cases to three variables, probably for those reasons of computationalfeasibility. Similar dimensionality problems have plagued the evaluation ofother models for mixed type data (cf. Cox & Wermuth, 1992).
We propose a class of multivariate Tobit models that reduces the
dimensionality problem and provides a parsimonious summary of high-dimensional censored survey data. The models are tailored to situationswhere there are J variables measured on N units, and are suited forexploratory analysis where the aim is to derive a number of latent variablesthat capture the multivariate dependencies among the J observed variables,as in standard factor models (Bartholomew & Knott, 1999; Bartholomew,1987). Our work is in line with that of Sammel and Ryan (1996), Sammel,Ryan and Legler (1997), Arminger and Küsters (1988), and Lance, Cornwelland Mulaik (1988),who proposed latent variable models for mixed type data.
Maximum likelihood factor analysis was originally developed for
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
continuous (normally distributed) observed variables, factor models forbinary outcome variables were developed by Bartholomew (1987), whileMuthén (1989) first proposed a factor model for censored data. We extendthe pioneering work of Muthén (1989) and Waller and Muthén (1992), whodeveloped a three-stage estimation procedure for confirmatory factoranalysis models for censored data. We extend their work in several ways. First, we develop a framework for both exploratory and confirmatory tobitfactor modeling. Second, we use a flexible type-2 tobit formulation. Third,we employ a simultaneous estimation procedure applying simulatedmaximum likelihood (Gouriéroux & Montfort, 1996), to solve the problem of
the evaluation of higher-order integrals involved in the estimation. The factorstructure that we impose on the covariance matrix of the unobservedvariables not only renders estimation simpler, but lends itself to interestingsubstantive interpretations and graphical representation.
Our purpose is to add to the factor analysis literature by developing
exploratory factor models for outcome variables that are censored, whichseems to be particularly useful in behavioral research. With our model onecan deal with exploratory and confirmatory ML factor analysis of data withmixed type variables, in which a (large) number of observations equal tozero. We describe the model and estimation procedure next, and provide twoapplications to behavioral data, on the prescription of pharmaceuticalproducts, and multi category buying behavior, respectively.
Assume a rectangular data matrix Y classified by n = 1, …, N sampling
units and j = 1, …, J variables. The observations are realizations of therandom variable y = (y ) and may take on non-negative values. We consider
a Type-2 Tobit model. These are more general than Type-1 Tobit models inthat they allow for a different data-generating mechanism for the zero andthe nonzero data. In particular, we allow for two types of partially observedvariables y1 and y2 that drive the zero and non-zero observations,
respectively. These partially observed variables differ in their mean values, and , and are linked through a common set of latent factors, x , causing
a common covariance structure among them. Thus we assume that there isa common set of latent factors underlying the zero and non-zero observeddata (alternatively, the model could be seen as a set of seemingly unrelatedregressions, with truncated outcome variables). A type-1 Tobit factor modelcan be obtained by constraining = for all j. If the Type-1 model holds,
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
then the data generation mechanisms for the zero and non-zero data are thesame. The different means for the two types of partially observed variablesallows the percentage of zero observations for each observed variable tovary independently of the mean. We define the model as follows:
and specify a factor structure on u = (u ):
with x the nth row of the (N × P) matrix X representing the scores of the
subjects on P latent factors for some assumed value of P, ⌳ a (J × P) matrixof fixed parameters, = ( ) and = ( ) (J × 1) vector with an intercept for
each observed variable, and ε = (ε ) a vector of independent error terms with
ε ~ N(0,2 ). We specify the x i.i.d standard normal, so that E(u ) = 0 and
Cov(u ) = ⌿ + ⌳Ј⌳, with ⌿ = diag(2 ). The x are the latent factors, the
are called the factor weights and 2 the unique variances for the J
variables. We accommodate both exploratory factor models, in which ⌳ isfree, and confirmatory factor models, in which according to prior theory, thenumber of factors P, as well as the structure of the weight matrix ⌳ areknown. Since Muthén (1989) already dealt with confirmatory factor models,we predominantly restrict attention to the exploratory factor models.
Often in factor analysis one wants to interpret either standardized factor
weights: , or the factor loadings, which are defined as the correlation of eachpartially observed variable with the latent factor in question (Bartholomew,1987, p. 49):
⌫ = diage⌳⌳′ + ⌿j ⌳ .
in which ⌰ contains all parameters and ⌽٪ and ٪ are the normaldistribution and density functions. The expectation of y , given that it is
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Edy | x ; y > 0i = + x ′ + hh٪ = ٪/⌽٪ the hazard rate of the normal distribution. Thus the
Ed y |x i = d + x
The likelihood contribution of subject n is:
denote the product over the censored and the
uncensored observations, respectively. Note that the proposed factorstructure reduces the J-tuple normal integral in the full multivariate Tobitmodel to a P-tuple (P < J) integral. Our approach is in line with exploratoryfactor models that have been developed in the psychometrics literature fornormal and binomial variables (Bartholomew, 1987; Krzanowski & Mariott,1995, Ch. 12; Bartholomew & Knott, 1999).
The standard exploratory factor model suffers from location, scale and
rotation indeterminacy (Bartholomew, 1987, p. 97; Bekker, Merckens,Wansbeek, 1994, pp. 84-90). For an extensive discussion aboutindeterminacy in the factor analysis model, see the debate in this Journal,initiated by Mauran (1996). Here, we expand upon location, scale androtation invariance in the factor Tobit model described above. The model isinvariant under an arbitrary translation of the means of the distribution of x .
Scale invariance occurs since the scale of ⌳ and the variance of x are not
separately identified. Location and scale invariance are alleviated by fixingthe mean and variance of the distribution of the latent factors: x ~ (0,1).
If we assume ⌳* = ⌳R with R an orthogonal rotation matrix, we have f(y |x ;⌰) = f(y |x* ;⌰*), for x* = x R-1. The distribution of x* equals: f (x* R), since the Jacobean is 1. Since the distribution of x R-1 is the same
as that of x for R orthogonal and x normal (Lancaster, 1954), the
unconditional distribution of y under the new parameterization, f(y ;⌰*), is
the same as that under the original parameterization, f(y ;⌰). Thus, the
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
model is rotation invariant. Rather than imposing constraints on theparameters to alleviate this invariance, which provides confirmatory models,we adhere to the convention in the exploratory factor analysis literature tochoose from among the set of possible solutions the one that has mostsubstantive meaning (cf. Bartholomew, 1987, p. 96; Krzanowski & Mariott,1995, p. 138). In reporting the number of parameters we take theidentification constraints into account. Note that several sets of P(P - 1)/2constraints imposed on the matrix ⌳ alleviate the rotation indeterminacy andrenders the model a confirmatory factor analysis model, in line with Muthén(1989) and Waller and Muthén (1992), which greatly facilitates in obtaining
identified models. Bekker, Mercens and Wansbeek (1994, p.87-88)elaborately discuss linear identifying restrictions for the factor model.
We estimate the Tobit factor model by maximizing the likelihood
functions defined as the product of the individual likelihood functions inEquation 7 across n. We use simulation to evaluate the integrals (Gouriéroux& Montfort, 1996). In simulated maximum likelihood (SML) estimation, thelikelihood contributions in Equation 7 are approximated by:
i dy1 |zt;⌰i [ ⌽(y1
where zt is drawn T times from (0,1). An appealing aspect of SMLestimation is that the simulated likelihood function in Equation 8 is twicedifferentiable, simplifying optimization with Newton-type algorithms (detailson SML and the first order derivatives of our model are provided in theappendix).
The large-sample properties of ML estimators in factor analysis have
been investigated by Anderson and Rubin (1956), and by Gill (1977), whodemonstrate consistency of the estimates. Ignoring the censoringmechanism in formulating the likelihood for the model defined by Equations1 and 2, however, leads to inconsistent estimates. This can be seen fromEquation 5, using x ~ (0,1), so that the results of Greene (1981) apply.
Consistency of the MLE in the factor Tobit model, given an arbitraryrestriction on ⌳ to obtain uniqueness (Bekker, Merckens & Wansbeek, 1994,p. 87), follows from Amemiya (1973), who proved that the ML estimator inthe Tobit model is consistent [if the parameter space is compact, (x ) are
XЈX/N > 0].
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Gouriéroux and Montfort (1996) describe the application of SML to the
estimation of a Tobit model. Their model includes an individual randomeffect that follows a (standard) normal distribution and arises as a specialcase of our model for P = 1 and ⌳ = , a scalar. They show that the SML
N / T → 0 as N ۋϱ and asymptotically equivalent
to the MLE. The bias is of order 1/T. Simulation studies by Keane (1993)and (Lee, 1995, 1997), show that SML has excellent properties for finitevalues of T. We use T = 200 in the application below (for example, Harris& Keane, 1999, use a similar number in a choice model).
In some cases, prior theory may be available to guide the choice of the
number of latent factors, P. In cases where the aim is only to takeheterogeneity into account often P = 1 is chosen. If the aim is graphicaldisplay of the dependency structure of the J parially observed variables asin the factor analysis literature (cf. Bartholomew, 1987), P = 2 or P = 3 is aconvenient choice. However, in many cases the value of P needs to bedetermined from the data. Akaike (1987) argues for the AIC statistic tocompare models with various values of P. A limitation of the AIC statistic,however, is that it is not dimension consistent: it does not asymptoticallyindicate the true model among a set of candidate models (Bozdogan, 1987). Inresponse, several authors have proposed dimension consistent criteria, such asthe Bayesian Information Criterion, BIC = -2 ln L + [J(P + 2) - P(P - 1)]ln(JN)(Schwartz, 1978), and the very similar Consistent Akaike InformationCriterion, CAIC statistic (Bozdogan, 1987). Based on the assumptions thatmodel dimensionality is fixed as N ۋ ϱ and that the true model is among theset of candidate models, these statistics indicate the true model withprobability one, asymptotically. Dimension consistent criteria have beencriticized because the assumption that the true model is among the set ofmodels considered is unlikely to hold in most applications, while in additionhigh probabilities of selecting the true model accrue only at large sample sizes(Burnham & Anderson, 1998, p. 69). However, in our application below thenumber of observations is large, a situation where empirical work supportsthe use of BIC (Rust, Simester, Brodie & Nilikant, 1995). We therefore basemodel selection on the BIC statistic (reporting AIC as well), which tends toresult in more parsimonious models than the AIC statistic. This parsimonyis preferable because of the easier interpretability of the factor solution inlow-dimensional spaces. Illustration to the Analysis of Drug Prescription Behavior
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Prescription-drug therapy is highly cost-effective as compared to other
medical interventions, such as hospitalization or surgery, while it ofteneliminates the need for those types of treatment. Pharmaceuticalmanufacturers’ sell prescription drugs to wholesalers, who again distributethe products to retail HMOs, hospitals, and clinics. Industry sales reached$124.6 billion in 1998 (PhRMA, 1999). The main drivers of growth in thissector of the pharmaceutical market have been non-price factors such as theincreased volume and the changing mix of prescriptions, accounting for 80percent of growth in 1998 (PhRMA, 1999). Even though prescription drugs
are sold to patients, most of the manufacturers’ marketing effort is focusedon physicians, who often decide the particular drug to be used by the patient. Prescription drug sales are hardly affected by pricing strategies of thepharmaceutical companies since neither the prescribing physicians nor thepatients bear the costs. The American Medical Association is concernedover the increased competition from “over-the-counter” (OTC) drugs,because of the potential of mistreatment of illnesses. Thus, there is greatinterest in the volume and pattern of prescriptions by physicians, frompharmaceutical companies and policy makers in health care alike. In orderto gain insights into the tendencies of drug prescription among aheterogeneous population of physicians, we analyze U.S. data onprescriptions for 33 different pharmaceutical drugs written by a sample of500 physicians during a period of one year. The drugs fall into six classes:convulsion, Parkinson, psychotherapeutic, anti-depressants, analgesics, andarthritic drugs.
We apply our Tobit factor model to identify latent dimensions underlying
physician prescription behavior. Prescription behavior is assumed to derivefrom physicians latent tendencies to prescribe drugs, which may be affectedby their specialization, the type and numbers of patients they treat as well asmarketing activities of pharmaceutical companies. We take the prescribedvolumes of drugs as indicators of those latent tendencies, where we do nothave a-priori hypotheses on those prescription tendencies. Once theparameters for our multivariate Tobit factor model are obtained, we applythem to compute factor scores for a hold-out sample of 4,361 physicians. We estimated the model for P = 1 through P = 4 latent factors, obtaining thestatistics in Table 1. The BIC criterion is minimal for the 3-factor solution.
Parameter estimates for the P = 3 solution are shown in Table 2 (pp. 63-
64; the factor weights are standardized and rotated to achieve betterinterpretability using the Varimax method Kaiser, 1958). Relatively highfactor weights are underlined. This table also shows two measures of fit atthe item (drug) level, the correlation between the observed and fitted volume
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Table 1Fit Statistics for the Tobit Factor Model for the Drug Prescription Data
= 3 Tobit Factor Model, Drug Prescription DataP
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
= correlation between actual and fitted prescription volume among prescribing physicians. %C =R
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Factor weights are standardized. a percentage of correct predictions of observed or censored data
of prescriptions among prescribing physicians (R), and the percentage ofcorrect predictions of whether each data point is observed or censored atzero (%C). Based on these two measures, one may conclude that the modelfits reasonably well to the discrete portion of the Tobit model, correctlypredicting between 26% (for Zyprexa) and 95% (for Ambien) of thecensored/non-censored observations. For the continuous portion of themodel the correlations between actual and fitted prescription volumes arebetween 0.15 (for Prosom) and 0.97 (for Imitrex Inj.).
The censoring intercepts indicate that several drugs have a particularly
high probability of being prescribed: Zoloft, Serzone and Paxil (Anti-depressives), Imitrex Tablets (Migraine), Ultram (Analgesic), Risperdal(Psychotherapeutic), and Relafen and Dyapro (Arthritis). Note that most ofthe drugs with large censoring intercepts (i.e., high market penetration) alsohave relatively large intercepts for the continuous portion of our model ( ) (the
correlation between the two sets of intercepts is 0.89). This indicates thatphysicians who prescribe these drugs tend to prescribe a large volumecompared to other drugs. In contrast, the parameters for some of the low-penetration drugs such as Aderall and Tegretol indicate that while a smallproportion of physicians ever prescribe those drugs (small $
so, tend to prescribe them in relatively high volumes (i.e., $
The high correlation of the two sets of intercepts may indicate that a
type-1 Tobit model is more appropriate. In order to test for the differencesin means among the discrete and continuous portions of our multivariateTobit model, we compare the Type 2 factor model against a Type 1 model,where = for all j. The LR test for the 3-factor solution (also chosen
for the Type 1 model on the basis of BIC) yields a value of 1639.2 on 33 df,which is highly significant, showing that the Type 1 model provides a worserepresentation of the data and that both types of means are required. Thisshows that the proportion of zeros in the data is independent of the mean ofthe distribution of the observed variables, but that the positive and zero dataeach obey specific data generation processes. Whereas the means capture
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
the aggregate prescription behavior across the population, the factorstructure captures heterogeneity among physicians through the underlyingdistribution of the latent factors.
Table 2 presents the estimated factor loadings, $
largest loadings underlined. The loadings are displayed graphically in Figures 1 and2. These figures show the loadings for the drugs (top panel) and related therapies(bottom panel) on factors 1-2 and 1-3, respectively. Each point on these plotsrepresents a vector terminus for a particular drug (cf. Bartholomew, 1984).
Visual inspection of the pattern of factor weights and factor loadings
inTable 2 and the top panel of Figure 1 and discussion with pharmacists, leads
to the interpretation of factor 1 as the propensity to prescribe neurologicaldrugs, in particular Imitrex for migraines, and drugs for Parkinson’s diseaseand seizures. Note, however, that specific drugs for Altzheimer (Aricept)and analgesics (Stadol and Tegretol) also load relatively high on thisdimension, indicating that there is a tendency to prescribe these drugs jointlywith these neurological drugs. The second factor is related to analgesicdrugs such as Daypro, Naprelan and Lodine. Therefore, physicians witha high score on this dimension are more likely to prescribe and to be heavyprescribers of analgesics and drugs against arthritis. However, note thatseveral of the drugs for migraine (Imitrex), psychotherapy (Prosom) andParkinson (Sinemet) also have somewhat higher loadings, indicating apropensity to prescribe those drugs in conjunction with the analgesics. Based on Figure 2 and Table 2, Factor 3 can be interpreted as the propensityto prescribe drugs against depression, and psychosis. Physicians with a highscore on this dimension are more likely to prescribe this type of drugs, andmore likely to prescribe a higher volume of these drugs than other physicians. Note that again a Parkinson drug (Aricept) loads high on this dimension,while the general purpose analgesics, Stadol and in particular Tegretol alsoload relatively high. Note that another analgesic, Toradol, tends not to beprescribed with drugs against psychosis.
In order to ascertain the validity of our results, we use the parameter
estimates from Table 2 to compute the factor scores for the hold-out sampleof 4,361 physicians and verify whether their scores are consistent with theirarea of specialization. Figure 3 shows the average score of all physicians byarea of specialization. On its top panel, one can see that Neurologists havethe highest scores on Factor 1, which was identified, as the propensity toprescribe neurological drugs, a quite intuitive finding. Similarly,Orthopedists have the highest average scores on Factor 2, and thereforehave the highest propensity to prescribe anti-arthritic drugs and analgesics. Note that physicians in Family, Internal and Preventive medicine also havehigh scores for Factor 2, while their scores on Factor 1 are higher than for
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
the Orthopedists. These less specialized physicians tend to prescribeanalgesics among a wider range of drugs. Some drugs load high on Factors1 and 3 also load on Factor 2 may because they are often first prescribed topatients by neurological and psychiatric specialists, but later prescriptionsare taken over by family doctors (this is known to occur for e.g. Imitrex,Sinemet, Stadol, Prosom). The bottom panel of Figure 3 shows thatPsychiatrists have the highest scores on Factor 3, the propensity to prescribepsychiatric drugs, again a quite intuitive finding that lends the results facevalidity. Figure 3 reveals a number of meaningful clusters of specialtiesbased on prescriptions. For example, the related specializations Orthopedy,
VOLTAREN-XR CATAFLAM LODINE XL NAPRELA RE VICOPROFENORUVAIL DURAGESIC TORADOL ORAL STADOL NS IMITREX (INJ) SINEMET CR IMITREX (TAB) IMITREX NASAL NEURONTIN TEGRETOL XR EFFEXOR XR ADDERALL RISPERDAL analgesic arthritic analgesic arthritic arthritic arthritic arthritic arthritic analgesic arthritic arthritic analgesic analgesic analgesic migraine parkinson migraine migraine convulsion alzheimer convulsion antidepress convulsion antidepress antidepress antidepress antidepress
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
antidepress antidepress Figure 1 Drug Prescription Factors 1 and 2 ADDERALL EFFEXOR XR RISPERDAL NEURONTIN TEGRETOL XR STADOL NS LODINE XL VICOPROFEN SINEMET CR IMITREX NASAL IMITREX (TAB) NAPRELAN IMITREX (INJ) CATAFLAM DAYPRO RELAFEN VOLTAREN-XR DURAGESIC TORADOL ORAL antidepress antidepress antidepress convulsion antidepress antidepress antidepress antidepress convulsion convulsion alzheimer analgesic arthritic analgesic parkinson analgesic migraine migraine arthritic migraine arthritic arthritic analgesic arthritic arthritic arthritic arthritic
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
analgesic analgesic Figure 2 Drug Prescription Factors 1 and 3 Factor 3 -0.3
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Figure 3 Drug Prescription Average Factor Scores for Physicians by Specialty Anesthesy, and Physiotherapy cluster together on the three factors, andthere is a general medicine cluster consisting of family, internal, preventiveend emergency medicine. Illustration to the Analysis of Cross-Category Buying
The number of brands in the global marketplace has rapidly expanded
in the eighties and nineties. In the face of competition, manufacturers pushthe boundaries of product use, create new usage situations for existingbrands, extend product lines to transfer brand equity beyond the originalcategory and pursue bundling and cross-category selling. For retailers,category management has emerged as an effective marketing strategy. Incategory management, the firm markets brands in a way to exploit thepattern in which consumers assemble baskets of products, based on anunderstanding of their behavior in question. For purposes of manufacturerand retailer marketing strategies alike, knowledge of preferences acrossand within categories is essential. Knowledge of multiple-categorypreference patterns allows the retailer to predict the likely composition ofmarket baskets and allows manufacturers and retailers to assess theprofitability of product bundling strategies. Therefore, the academicmarketing literature has recently seen an upsurge in the analysis of cross-category buying behavior of consumers (e.g. Russell & Kamakura, 1997,Ainsly & Rossi, 1998, Seetharaman, Ainsly & Chintagunta, 1999). In linewith that stream of research, we apply the factor Tobit model to analyzecross-category purchasing data and reveal consumers propensities to buyproducts from different categories to address the central role of cross-category preferences in the design and implementation of these marketingstrategies.
The data analyzed here are taken from a panel of 626 Canadian
households in one market area. The data consist of the total volume (inequivalent units) purchased of brands in four paper goods categories
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
(Toilet Paper, TP, 10 brands; Paper Towel, PT, 10 brands; Facial Tissue,FT, 9 brands; Table Napkins, TN, 10 brands) for a one year period. Thedata are collected by scanning equipment at checkout counters ofretailers, among the panel of consumers that are identified throughspecial cards. The data set was analyzed previously by Russell andKamakura (1997) using a latent-class model. The analysis is based uponthe purchase volumes of twelve brands with a (volume based) marketshare greater than 0.5% in at least one product category, and theremaining brands combined into an “Other” brand group. Seven of thebrands compete in all product categories. For reasons of confidentiality,
we cannot disclose the brand names in these categories, but indicate themwith letters. We use the factor models to identify the latent propensitiesto buy products across those four categories and to address the questionof whether brand-specific (complementary) or category specific(competitive) factors can be identified. This facilitates the developmentof cross-category marketing strategies, including bundling, crosscategory promotional programs, cross category loyalty programs andcross-category positioning.
We apply the factor models for P = 1 to P = 5 factors. Table 3 shows the
statistics for the models. BIC is minimal for P = 3. The factor weights andloadings for the P = 3 model are reported in Table 4 (the factor weights arestandardized and rotated using Varimax; high weights and loadings areunderlined). This table also shows the correlation between the observed andfitted volume of purchases among buyers (R) and the percentage of correctpredictions of whether each product is bought (%C). Based on these twomeasures, one concludes that the model fits reasonably well to the discreteportion of the Tobit model, correctly predicting between 34% (for PT-SCT)and 97% (for TN-RTA) of the censored/non-censored observations. For thecontinuous portion of the model the correlations between actual and fittedprescription volumes are between 0.18 (for TP-FAC) and 0.81 (for PT-GRN).
The censoring intercepts indicate that some brands have a high
probability of being purchased, in particular KLN in the facial tissue categoryand to a lesser extent FAC in the toilet paper category and RTB is the tablenapkin category. Note that the intercepts reflect differences in purchaseincidence of the categories as a whole, where the table napkin category hasa low purchase incidence and the toilet paper and paper towel categorieshave higher incidence. As indicated by high values of , products that are
bought in large volumes are KLN in the facial tissue category and RTB in thetable napkin category. Here, the correlation between the censoring
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Table 3Fit Statistics for the Tobit factor Model Paper Goods Data
intercepts (indicating high incidence) and the intercepts of the continuouspart of the model ( ) (indicating high volumes bought) is lower (0.47) than
in the drug application. This indicates that the Type 1 Tobit factor modelwould not provide a good representation of the category data and shows thatpurchase incidence is independent of the mean of the distribution of thepurchase quantities: the positive and zero data each obey specific datageneration processes. Whereas the means capture the aggregate purchasebehavior, the factor structure captures heterogeneity among consumersthrough the distribution of the latent factors.
The estimates in Table 4 display an interesting pattern of strong cross-
category purchase tendencies for a few dominant brands, with a mixed patternof within and between category competition. We interpret the patterns offactor weights and factor loadings in Table 4 (we have a slight preference forthe factor weights since the solution is more simple to interpret, while thevarimax rotation tends to produce a solution where many brands load high onthe first factor). The factor weights and loadings show that the first factor isclearly a brand-specific factor capturing the unique purchase predisposition ofconsumers towards the (national) “green” brand GRN. Next to GRN, thereis a cluster of brands with high loadings on this factor (FAC, SCT, and WHI). For example, in the paper towels (PT) category, high purchase rates of GRNtend to co-occur with the category specific national brands MAJ and WHI. This indicates category specific competition between those two brands. Thesecond factor clearly represents a store-brand dimension, with strong factorweights on the two retail brands (RTA and RTB), and low weights on all otherbrands for all categories.
The third factor captures the position of the MFM brand across
categories, since we find highly negative weights for MFM in all fourcategories. This particular brand appears to be uniquely positioned in allcategories, showing a strong pattern of joint purchases across categories,with little direct competition with other brands. The single exception is thepaper towels category, where competition comes from towels manufactured
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
by HID (as well as SCT and FAC), which show strongly negative weightson this factor. This third factor also shows higher positive loadings for the“green” brand GRN, indicating an almost diametrically opposite positioningfor the MFM and GRN brands; consumers who buy one brand is highlyunlikely to buy the other.
Figure 4 plots the three factors to display the competitive structure
graphically. Once one considers the three-dimensional directions of thevectors representing each category-brand combination, one can see strongcross-category competitive positions for the MFM brand, and for the“green” brand GRN. These two national brands display very strong and
= 3 Tobit Factor Model, Paper Goods DataP
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
= correlation between actual and fitted prescription volume among prescribing physicians. %R
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
percentage of correct predictions of observed or censored data. TP-RTA PT-RTB TP-FAC FT-FAC
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Figure 4 Factor Plots for the Paper Goods Application
unique brand franchises across all four categories. Consumers who buy thebrand in one category are highly likely to also buy the brand in all othercategory. Most importantly, they are also unlikely to buy any other brandin these categories. The only exceptions are the paper towels by HID, whichtend to be bought by buyers of the MFM brand, and paper towels by MAJ,which are bought by buyers of the GRN brand.
A similar pattern of cross-category preferences is found for the retail
brands. Consumers who buy a retail brand in one product category are highlylikely to also buy that same retail brand in other product categories. Incontrast to the MFM and GRN brands, which occupied unique competitivepositions, our Tobit factor map indicates that the two retail brands RTA andRTB clearly compete directly against each other.
Making distribution assumptions on observed variables has essentially
moved factor analysis from the realm of descriptive methods to the area ofstatistical modeling. This practice dates back to Lawley (1940), who firstdeveloped maximum likelihood estimation methods for factor analysis. Later, factor analysis was developed for binary variables (Bartholomew,1980), and for truncated variables (Muthén, 1989). Along the lines of thesestudies, we propose a factor model that is based on mixed discrete-continuous data. Rather than specifying different distributions for differentvariables, we have employed the Tobit framework that has gained greatpopularity for the analysis of consumption behavior. In doing that, we builddirectly on the pioneering work of Muthén (1989). Our model allows for theexploration of high dimensional data by displaying the latent structureunderlying it, being based on assumptions on the type of censoringmechanism. Much of our enterprise has been made possible by the adventof simulation methods. The Tobit modeling framework has the advantagesover previously proposed models for mixed data that it is feasible for large
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
numbers of variables (cf. Cox & Wermuth, 1992) and directly restricts theoutcomes of the continuous data to be positive (cf. Sammel, Ryan & Legler,1997; Bartholomew & Knott, 1999, p. 173).
Our approach, enables full (S)ML estimation of exploratory factor
models, but also of confirmatory factor models when appropriate restrictionshave been identified. Muthén (1989) proposes a confirmatory tobit-typefactor analysis, which was subsequently applied by Waller and Muthén(1992) to behavioral genetics. They propose a three-step procedure. First,univariate Tobit models are employed to estimate the mean and variance ofthe latent censored variable, using ML. Then, the bivariate correlation is
estimated from the bivariate distributions by maximum likelihood, fixing themean and variance parameters at the estimated values in the first step. Thirdly, generalized least squares is applied to these estimated correlationsto estimate a confirmatory factor model. A consistent estimator of theasymptotic covariance matrix of the correlations estimated from the twoprevious steps is used as a weight matrix in the GLS estimation procedure. Muthéns approach overcomes the problem of high dimensional integration byreducing the P-variate normal integral to [J(J + 1)/2] two-dimensionalintegrals. Thus, the procedure requires running [J(J + 1)/2] Tobit models anda GLS confirmatory factor model. Although being a multi-stage procedure,Muthén’s method provides consistent –but not efficient- estimates of thefactor model parameters. Our procedure extends that of Muthén in severalways, building upon his developments. First, Muthén’s approach deals withconfirmatory models while we deal with both confirmatory and exploratoryfactor analysis, focussing on the latter. However, Muthén’s method may berelatively easily modified to deal exploratory models as well. Secondly,Muthén presumably accommodates a Type-1 Tobit model, while we dealwith a type-2 factor model. The latter offers the advantage of providing amore flexible model of the censored and non-censored data. In the empiricalapplications on drug prescription and multi-category purchasing, we showedthat the fit of the type-2 model is significantly better than that of a type-1model. Further, rather than the three stage estimation approach of Muthén,we estimate all parameters simultaneously with Simulated Likelihood. Thisgives us consistent and asymptotically efficient estimates of the modelparameters. The application of SML to Tobit factor models has not beenpreviously described. A current limitation of the proposed SML procedureis its computational cost, which is a curse shared by most models estimatedusing simulation. We expect this limitation to become less and less of aproblem in the future with the increasing speed of computers. Note thatMuthéns method would require the estimation of 561 and 780 Tobit modelsfor our two applications, respectively, with associated programming and data
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Due to our simultaneous estimation procedure, our approach easily lends
itself to extensions in various directions, that we however consider beyondthe purpose of the present paper, including different distributions for theobserved variable and the inclusion of predictor variables. Given theassumption of a normal distribution of the latent factors, our factor modelmay be seen as a way to include heterogeneity in Tobit models, along the linesof Gouriéroux and Montfort (1996). Instead of including a single randomterm with fixed variance to capture misspecification, our Tobit factor modelincludes P random terms, which are weighted differently for the observed
variables, creating covariance among them. Gouriéroux and Montfort(1996) include predictors, which we have not done (except for the intercepts)since none were available in our applications, but the model can be extendedto include such predictor effects.
Ainslie, A. & Rossi, P. (1998). Similarities in choice behavior across product categories. Marketing Science, 17, 91-106.
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52 (1), 317-332. Amemiya, T. (1973). Regression analysis when the dependent variable is truncated
normal. Econometrica, 41, 723-732.
Amemiya, T. (1985). Advanced econometrics. Cambridge: Harvard University Press. Anderson, T. W. & Rubin, H. (1956). Statistical inference in factor analysis. Proceedingsof the Third Berkeley Symposium in Mathematical Statistics and Probability, 5, 111-150.
Arminger, G. & Küsters, U. (1988). Latent trait models with indicators of mixed
measurement level. In R. Langeheine & J. Rost (Eds.), Latent trait and latent classmodels. New York: Plenum.
Bartholomew, D. J. (1980). Factor analysis for categorical data. Journal of the RoyalStatistical Society, B, 42, 293-321.
Bartholomew, D. J. (1987). Latent variable models and factor analysis. New York: Oxford
Bartholomew, D. J. & Knott, M. (1999). Latent variable models and factor analysis.
Bekker, P., Merckens, A., & Wansbeek T. J. (1994). Identification, equivalent models andcomputer algebra. New York: Academic Press Inc.
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The
general theory and its analytical extensions. Psychometrika 52, 345-370.
Burkett, J. P. (1998). Bureaucratic behavior modeled by reduced-rank regression: The case
of expenditures from the Soviet state budget. Journal of Economic Behavior andOrganization, 34.
Burnham, K. P. & Anderson, D. R. (1998). Model selection and inference. New York:
Chib, S. (1992). Bayes inference in the Tobit Censoring Regression Model. Journal of
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Cox, D. R. & Wermuth, N. (1992). Response models for mixed binary and quantitative
variables. Biometrika, 79, 441-461.
DeSarbo W. S. & Choi J. (1999). A latent structure double hurdle regression model for
exploring heterogeneity in consumer search patterns. Journal of Econometrics, 89,423-456.
Donovan, J. E. (1993). Young adult drinking-driving: Behavioral and psychosocial
correlates. Journal of Studies on Alcohol, 54 600-614.
Fitzmaurice, G. M. & Laird, N. M. (1995). Regression models for bivariate discrete and
continuous outcome with clustering. Journal of the American Statistical Association,90, 845-852.
Gill, R. D. (1977). Consistency of maximum likelihood estimator of the factor analysis
model when the observations are not multivariate normal. In J. R. Bara, F. Brodeau, G. Romier & B. van Cutsem (Eds.), Recent developments in statistics (pp. 437-440). Amsterdam: North Holland.
Gourieroux, C. & Montfort, A. (1996). Simulation based econometric methods. Oxford:
Greene, W. H. (1981). On the asymptotic bias of the ordinary least-squares estimator of
the Tobit model. Econometrica, 49, 505-513.
Harris, K. M. & Keane, M. P. (1999). A model of health plan choice: inferring preferences
and perceptions from a combination of revealed preference and attitudinal data. Journal of Econometrics, 89, 131-158.
Heckman, J. J. (1976). The common structure of statistical models of truncation, sample
selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5, 475-492.
Johnson, N. L., Kotz, S. & Balakrishnan, N. (1995). Continuous univariate distributions,
Jones, A. & Posnett, J. (1991). Charitable donations by U.S. households: Evidence from
family expenditure survey. Applied Economics, 23, 343-351.
Kaiser, H. F. (1958). The Varimax criterion for analytical rotation in factor analysis.
Keane, M. P. (1993). Simulation estimation for panel data models with limited dependent
variables. In G. S. Maddala, C. R. Rao, H. D. Vinod (Eds.), Handbook of statistics. Amsterdam: Elsevier.
Krzanowski, W. J. & Mariott, F. H. C. (1995). Multivariate analysis, Kendall Library ofStatistics 2. London: Arnold.
Lancaster, H. O. (1954). Traces and cumulants of quadratic forms in normal variables.Journal of the Royal Statistical Society, B, 16, 247-254.
Lance, C. E., Cornwell, J. M. & Mulaik, S. A. (1988). Limited information parameter
estimates for latent of mixed manifest and latent variable models. MultivariateBehavioral Research, 23, 171-187.
Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum
likelihood. Proceedings of the Royal Society of Edinburgh, 61, 176-185.
Lee, L. F. (1995). Asymptotic bias in simulated maximum likelihood estimation of discrete
choice models. Econometric Theory, 437-483.
Lee, L. F. (1997). Simulated maximum likelihood estimation of dynamic discrete choice
statistical models: Some Monte Carlo results. Journal of Econometrics, 82, 1-35.
Mauran, M. D. (1996). Metaphor taken as math: Indeterminancy in the factor analysis
model. Multivariate Behavioral Research, 31(4), 517-538.
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Mingshan, L. (1999). Separating the true effect from gaming in incentive-based contracts in
health care. Journal of Economics & Management Strategy, 8, 383-423.
Mulaik, S. A. (1972). The foundations of factor analysis. New York: McGraw Hill. Muthén, B. O. (1989). Tobit factor analysis. British Journal of Mathematical andStatistical Psychology, 42, 241-250.
Olsen, R. J. (1978). A note on the uniqueness of the maximum likelihood estimator for the
Tobit model, Econometrica, 46, 1211-1215.
PhRMA (1999). Industry profile 1998. PhRMA, USA. Russell, G. J. & Kamakura, W. A. (1997). Modeling multiple category brand preference
with household basket data. Journal of Retailing, 73, 439-462.
Rust, R. T., Simester, D., Brodie, R., & Nilikant, V. (1995). Model selection criteria: an
investigation of relative accuracy, posterior probabilities and combinations of criteria. Management Science 41, 322-333.
Sammel, M. D. & Ryan, L. M. (1996). Latent variable models with fixed effects.
Sammel, M. D., Ryan, L. M. & Legler, J. M. (1997). Latent variable models for mixed discrete
and continuous outcomes. Journal of the Royal Statistical Society, B, 59 (3), 667-678.
Seetharaman, P. B. Ainslie, A. & Chintagunta, P. K. (1999). Investigating household state
dependence across categories. Journal of Marketing Research, 36, 488-500.
Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-
Sher, K. J. & Wood, M. D. (1996). Alcohol outcome expectancies and alcohol use: A latent
variable cross-lagged panel study. Journal of Abnormal Psychology, 105, 561, 575.
Tobin, J. (1958). Estimation of relationships for limited dependent variables.
Vandenberg, J. & Richardson, R. (1999). The impact of high involvement work processes
on organizational effectiveness. Group and Organization Management, 24 (3), 300-340.
Wales, T. J. & Woodland, A. D. (1980). Sample selectivity and the estimation of labor
supply functions. International Economic Review, 21, 437-468.
Waller, G. N. & Muthén, B. O. (1992). Genetic Tobit factor analysis: Quantitative genetic
modeling with censored data. Behavior Genetics, 22, 265-292. Simulated Maximum Likelihood (Gouriéroux & Montfort, 1996)
The generic problem in which simulation can be applied is to evaluate a
∑ z dy |x;⌰i bx dgx,
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
where x is a P-dimensional multivariate random variable with a normaldensity (x), and y is a J-dimensional observation vector. The estimator $
obtained by maximizing Equation A1, which is often done numerically, isconsistent, efficient and asymptotically normal for a large class of models. If the dimensionality P of x is larger than three, standard numericalintegration cannot be used to evaluate the log-likelihood. The idea ofsimulation is to draw T random variables zt from ٪ and use
∑ ∑ dy |z ;⌰i T
⌰ y) as T ۋ from the strong law of large numbers, so
that the simulated likelihood function is a consistent simulator of thelikelihood function. The value of ⌰ that maximizes Equation A2 is the SMLestimator. SML provides consistent estimators only if T ۋ ϱ as N ۋ ϱ. Thiscan be seen as follows:
since the mean over t converges to the integral function for T ۋ ϱ. Because
⋅ is a consistent simulator of L٪, this equals
so that the estimator is consistent and asymptotically equivalent to the MLestimator.
The procedure for maximizing Equation A2 works as follows. 1.Assume z ~ (z|,⌺), multivariate normal. Fix a value of T. To draw
z from the multivariate normal distribution, first draw NPT values of u, withu ~ (0,I ) independent normal with the same dimensionality as z.
2.Compute the Choleski decomposition CCЈ = ⌺. Then zt = + Cut ~
(,⌺). Store the NPT values of z computed in this way. These values willremain the same throughout the optimization procedure.
3.Compute the simulated likelihood function in Equation A2 based on the
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
4.Maximize Equation A2 numerically over ⌰ using a Newton type
algorithm to find the SML estimator. For that purpose one needs the firstorder derivatives of the simulated log-likelihood function:
∂~Ldy |zt;⌰i ⌰
Details on Estimation of the Proposed Model
In order to speed-up estimation, we replace the standard normal ogive by
a logistic approximation, so that the individual log-likelihood contribution iscomputed as:
This approximation of the cumulative normal by the cumulative logisticdistribution function is accurate, since there is a close similarity on shapebetween the normal and logistic distributions, while the difference, attributedto the longer tails of the logistic, has hardly any effect on the cumulativedistribution function (Johnson, Kotz & Balakrishnan, 1995, p. 119). Alternatively, we could have formulated the model in terms of the logisticdistribution rather than the normal, but since that seems counter to thecurrent practice of Tobit modeling, we will not do so. The gradients neededfor a Newton-Raphson search are:
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
dy |zt ;⌰id1− Q i
dy |zt ;⌰id1− Q i
Once the parameters of the model are estimated, each subject n can be
evaluated along the latent dimensions, by solving the non-linear equationsbelow for the factor scores, x :
indicates a sum over the non-censored data.
Downloaded by [Duke University Libraries] at 12:19 14 August 2012
Methods for the Prevention and Treatment of Bark Beetles Provided as a Public Service by The Tano Road Association A Real Life Bark Beetle Scenario Along Tano Road… . “… … one property owner ignored his dying piñon trees. After threeweeks, the bark beetle eggs hatched, the larvae matured, exited the tree,and spread to his other piñon trees. By summer’s end, he had los
An ankle sprain is a common injury often caused when the ankle is moved through a greater range of movement than normal. This stretches and weakens the ligaments and soft tissues that hold the ankle and foot bones in place. • Inability to move the ankle normallyThe first treatment is to calm the inflammation and control the swelling and pain. This can be managed with the “ RICE ” treatm