Skip to contents

The probit model

The probit model1 is a regression-type model where the dependent variable only takes a finite number of values and the error term is normally distributed (Agresti 2015). Its purpose is to estimate the probability that the dependent variable takes a certain, discrete value. The most common application are discrete choice scenarios. The dependent variable here is one of finitely many and mutually exclusive alternatives, and explanatory variables typically are characteristics of the deciders or the alternatives.

To be concrete, assume that we possess data of NN decision makers which choose between J2J \geq 2 alternatives2 at each of TT choice occasions3. Specific to each decision maker, alternative and choice occasion, we furthermore observe PP choice attributes that we use to explain the choices. The continuous choice attributes cannot be linked directly to the discrete choices but must take a detour over a latent variable. In the discrete choice setting, this variable can be interpreted as the decider’s utility of a certain alternative. Decider nn’s utility UntjU_{ntj} for alternative jj at choice occasion tt is modeled as

Untj=Xntjβ+ϵntj\begin{equation} U_{ntj} = X_{ntj}'\beta + \epsilon_{ntj} \end{equation}

for n=1,,Nn=1,\dots,N, t=1,,Tt=1,\dots,T and j=1,,Jj=1,\dots,J, where

  • XntjX_{ntj} is a (column) vector of PP characteristics of jj as faced by nn at tt,

  • βP\beta \in {\mathbb R}^{P} is a vector of coefficients,

  • and (ϵnt:)=(ϵnt1,,ϵntJ)MVNJ(0,Σ)(\epsilon_{nt:}) = (\epsilon_{nt1},\dots,\epsilon_{ntJ})' \sim \text{MVN}_{J} (0,\Sigma) is the model’s error term vector for nn at tt, which in the probit model is assumed to be multivariate normally distributed with zero mean and covariance matrix Σ\Sigma.

Now let ynt=jy_{nt}=j denote the event that decision maker nn chooses alternative jj at choice occasion tt. Assuming utility maximizing behavior of the decision makers4, the decisions are linked to the utilities via

ynt=argmaxj=1,,JUntj.\begin{equation} y_{nt} = {\arg \max}_{j = 1,\dots,J} U_{ntj}. \end{equation}

In the ordered probit case, the concept of decider’s having separate utilities for each alternative is no longer natural (Train 2009). Instead, we model only a single utility value Unt=Xntβn+ϵnt\begin{align*} U_{nt} = X_{nt}'\beta_n + \epsilon_{nt} \end{align*} per decider nn and choice occasion tt, which we interpret as the “level of association” that nn has with the choice question. The utility value falls into discrete categories, which in turn are linked to the ordered alternatives j=1,,Jj=1,\dots,J. Formally, ynt=j=1,,JjI(γj1<Untγj),\begin{align*} y_{nt} = \sum_{j = 1,\dots,J} j \cdot I(\gamma_{j-1} < U_{nt} \leq \gamma_{j}), \end{align*} with end points γ0=\gamma_0 = -\infty and γJ=+\gamma_J = +\infty, and thresholds (γj)j=1,,J1(\gamma_j)_{j=1,\dots,J-1}. To ensure monotonicity of the thresholds, we rather estimate logarithmic threshold increments djd_j with γj=i=1,,jexpdi\gamma_j = \sum_{i=1,\dots,j} \exp{d_i}, j=1,,J1j=1,\dots,J-1.

Choice behavior heterogeneity

Note that the coefficient vector β\beta is constant across decision makers. This assumption is too restrictive for many applications.5 Heterogeneity in choice behavior can be modeled by imposing a distribution on β\beta such that each decider can have their own preferences.

Formally, we define β=(α,βn)\beta = (\alpha, \beta_n), where α\alpha are PfP_f coefficients that are constant across deciders and βn\beta_n are PrP_r decider-specific coefficients. Consequently, P=Pf+PrP = P_f + P_r. Now if Pr>0P_r>0, βn\beta_n is distributed according to some PrP_r-variate distribution, the so-called mixing distribution.

Choosing an appropriate mixing distribution is a notoriously difficult task of the model specification. In many applications, different types of standard parametric distributions (including the normal, log-normal, uniform and tent distribution) are tried in conjunction with a likelihood value-based model selection, cf., Train (2009), Chapter 6. Instead, RprobitB implements the approach of (Oelschläger and Bauer 2020) to approximate any underlying mixing distribution by a mixture of (multivariate) Gaussian densities. More precisely, the underlying mixing distribution gPrg_{P_r} for the random coefficients (βn)n(\beta_n)_{n} is approximated by a mixture of PrP_r-variate normal densities ϕPr\phi_{P_r} with mean vectors b=(bc)cb=(b_c)_{c} and covariance matrices Ω=(Ωc)c\Omega=(\Omega_c)_{c} using CC components, i.e.

βnb,Ωc=1CscϕPr(bc,Ωc).\begin{equation} \beta_n\mid b,\Omega \sim \sum_{c=1}^{C} s_c \phi_{P_r} (\cdot \mid b_c,\Omega_c). \end{equation}

Here, (sc)c(s_c)_{c} are weights satisfying 0<sc10 < s_c\leq 1 for c=1,,Cc=1,\dots,C and csc=1\sum_c s_c=1. One interpretation of the latent class model is obtained by introducing variables z=(zn)nz=(z_n)_n, allocating each decision maker nn to class cc with probability scs_c, i.e.

Prob(zn=c)=scβnz,b,ΩϕPr(bzn,Ωzn).\begin{equation} \text{Prob}(z_n=c)=s_c \land \beta_n \mid z,b,\Omega \sim \phi_{P_r}(\cdot \mid b_{z_n},\Omega_{z_n}). \end{equation}

We call the resulting model the latent class mixed multinomial probit model. Note that the model collapses to the (normally) mixed multinomial probit model if Pr>0P_r>0 and C=1C=1, to the multinomial probit model if Pr=0P_r=0 and to the binary probit model if additionally J=2J=2.

Model normalization

As is well known, any utility model needs to be normalized with respect to level and scale in order to be identified (Train 2009). Therefore, we consider the transformed model

Ũntj=X̃ntjβ+ϵ̃ntj,\begin{equation} \tilde{U}_{ntj} = \tilde{X}_{ntj}' \beta + \tilde{\epsilon}_{ntj}, \end{equation}

n=1,,Nn=1,\dots,N, t=1,,Tt=1,\dots,T and j=1,,J1j=1,\dots,J-1, where (choosing JJ as the reference alternative) Ũntj=UntjUntJ\tilde{U}_{ntj} = U_{ntj} - U_{ntJ}, X̃ntj=XntjXntJ\tilde{X}_{ntj} = X_{ntj} - X_{ntJ}, and ϵ̃ntj=ϵntjϵntJ\tilde{\epsilon}_{ntj} = \epsilon_{ntj} - \epsilon_{ntJ}, where (ϵ̃nt:)=(ϵ̃nt1,...,ϵ̃nt(J1))MVNJ1(0,Σ̃)(\tilde{\epsilon}_{nt:}) = (\tilde{\epsilon}_{nt1},...,\tilde{\epsilon}_{nt(J-1)})' \sim \text{MVN}_{J-1} (0,\tilde{\Sigma}) and Σ̃\tilde{\Sigma} denotes a covariance matrix with the top-left element restricted to one.6

Parameter labels

In RprobitB, the probit model parameters are saved as an RprobitB_parameter object. Their labels are consistent with their definition in this vignette. For example:

RprobitB:::RprobitB_parameter(
  P_f = 1,
  P_r = 2,
  J = 3,
  N = 10,
  C = 2, # the number of latent classes
  alpha = c(1), # the fixed coefficient vector of length 'P_f'
  s = c(0.6, 0.4), # the vector of class weights of length 'C'
  b = matrix(c(-1, 1, 1, 2), nrow = 2, ncol = 2),
  # the matrix of class means as columns of dimension 'P_r' x 'C'
  Omega = matrix(c(diag(2), 0.1 * diag(2)), nrow = 4, ncol = 2),
  # the matrix of class covariance matrices as columns of dimension 'P_r^2' x 'C'
  Sigma = diag(2), # the differenced error term covariance matrix of dimension '(J-1)' x '(J-1)'
  # the undifferenced error term covariance matrix is labeled 'Sigma_full'
  z = rep(1:2, 5) # the vector of the allocation variables of length 'N'
)
#> alpha : 1
#> 
#> C : 2
#> 
#> s : double vector of length 2 
#> 0.6 0.4
#> 
#> b : 2 x 2 matrix of doubles 
#>      [,1] [,2]
#> [1,]   -1    1
#> [2,]    1    2
#> 
#> 
#> Omega : 4 x 2 matrix of doubles 
#>      [,1] [,2]
#> [1,]    1  0.1
#> [2,]    0    0
#> [3,]    0    0
#> [4,]    1  0.1
#> 
#> 
#> Sigma : 2 x 2 matrix of doubles 
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1
#> 
#> 
#> Sigma_full : 3 x 3 matrix of doubles 
#>      [,1] [,2] [,3]
#> [1,]    2    1    1
#> [2,]    1    2    1
#> [3,]    1    1    1
#> 
#> 
#> beta : 2 x 10 matrix of doubles 
#>       [,1] [,2]  [,3] ... [,10]
#> [1,] -0.03 0.97 -0.25 ...  1.52
#> [2,] -0.01 1.82  0.07 ...  1.75
#> 
#> 
#> z : double vector of length 10 
#> 1 2 1 ... 2
#> 
#> d : NA

Mind that the matrix Sigma_full is not unique and can be any matrix that results into Sigma after the differencing, see the non-exported function RprobitB:::undiff_Sigma().

References

Agresti, A. 2015. Foundations of Linear and Generalized Linear Models. Wiley.
Bliss, C. I. 1934. “The Method of Probits.” Science 79 (2037). https://doi.org/10.1126/science.79.2037.38.
Hewig, Johannes, Nora Kretschmer, Ralf H. Trippe, Holger Hecht, Michael G. H. Coles, Clay B. Holroyd, and Wolfgang H. R. Miltner. 2011. “Why Humans Deviate from Rational Choice.” Psychophysiology 48 (4): 507–14. https://doi.org/https://doi.org/10.1111/j.1469-8986.2010.01081.x.
Oelschläger, L., and D. Bauer. 2020. “Bayes Estimation of Latent Class Mixed Multinomial Probit Models.” TRB Annual Meeting 2021.
Train, Kenneth E. 2009. Discrete Choice Methods with Simulation. 2nd ed. Cambridge University Press. https://doi.org/10.1017/CBO9780511805271.