This function prepares empirical choice data.

Usage

prepare_data(
form,
choice_data,
re = NULL,
alternatives = NULL,
id = "id",
idc = NULL,
standardize = NULL,
impute = "complete_cases"
)

Arguments

form

A formula object that is used to specify the probit model. The structure is choice ~ A | B | C, where

• A are names of alternative and choice situation specific covariates with a generic coefficient,

• B are names of choice situation specific covariates with alternative specific coefficients,

• and C are names of alternative and choice situation specific covariates with alternative specific coefficients.

Separate multiple covariates of one type by a + sign. By default, alternative specific constants (ASCs) are added to the model (for all except for the last alternative due to identifiability). They can be removed by adding +0 in the second spot. See the vignette on choice data for more details.

choice_data

A data frame of choice data in wide format, i.e. each row represents one choice occasion.

re

A character (vector) of covariates of form with random effects. If re = NULL (the default), there are no random effects. To have random effects for the alternative specific constants, include "ASC" in re.

alternatives

A character vector with the names of the choice alternatives. If not specified, the choice set is defined by the observed choices.

id

A character, the name of the column in choice_data that contains unique identifier for each decision maker. The default is "id".

idc

A character, the name of the column in choice_data that contains unique identifier for each choice situation of each decision maker. The default is NULL, in which case these identifier are generated automatically.

standardize

A character vector of names of covariates that get standardized. Covariates of type 1 or 3 have to be addressed by <covariate>_<alternative>. If standardize = "all", all covariates get standardized.

impute

A character that specifies how to handle missing entries (the elements of) as_missing) in choice_data, one of:

• "complete_cases", removes all rows containing missing entries (the default),

• "zero_out", replaces missing entries by zero (only for numeric columns),

• "mean", imputes missing entries by the covariate mean (only for numeric columns).

Value

An object of class RprobitB_data.

Details

Requirements for choice_data:

• It must contain a column named id which contains unique identifier for each decision maker.

• It can contain a column named idc which contains unique identifier for each choice situation of each decision maker. If this information is missing, these identifier are generated automatically by the appearance of the choices in the data set.

• It can contain a column named choice with the observed choices, where choice must match the name of the dependent variable in form. Such a column is required for model fitting but not for prediction.

• It must contain a numeric column named p_j for each alternative specific covariate p in form and each choice alternative j in alternatives.

• It must contain a numeric column named q for each covariate q in form that is constant across alternatives.

See the vignette on choice data for more details.

• check_form() for checking the model formula

• overview_effects() for an overview of the model effects

• create_lagged_cov() for creating lagged covariates

• as_cov_names() for renaming alternative-specific covariates

• simulate_choices() for simulating choice data

• train_test() for splitting choice data into a train and test subset

Examples

data("Train", package = "mlogit")
data <- prepare_data(
form = choice ~ price + time + comfort + change | 0,
choice_data = Train,
re = c("price", "time"),
id = "id",
idc = "choiceid",
standardize = c("price", "time")
)