This function prepares choice data for estimation.
Usage
prepare_data(
form,
choice_data,
re = NULL,
alternatives = NULL,
ordered = FALSE,
ranked = FALSE,
base = NULL,
id = "id",
idc = NULL,
standardize = NULL,
impute = "complete_cases"
)Arguments
- form
[
formula]
A model description with the structurechoice ~ A | B | C, wherechoiceis the name of the dependent variable (the choices),Aare names of alternative and choice situation specific covariates with a coefficient that is constant across alternatives,Bare names of choice situation specific covariates with alternative specific coefficients,and
Care names of alternative and choice situation specific covariates with alternative specific coefficients.
Multiple covariates (of one type) are separated by a
+sign. By default, alternative specific constants (ASCs) are added to the model. They can be removed by adding+0in the second spot.In the ordered probit model (
ordered = TRUE), theformulaobject has the simple structurechoice ~ A. ASCs are not estimated.- choice_data
[
data.frame]
Choice data in wide format, where each row represents one choice occasion.- re
[
character()|NULL]
Names of covariates with random effects. Ifre = NULL(the default), there are no random effects. To have random effects for the ASCs, include"ASC"inre.- alternatives
[
character()]
The names of the choice alternatives. If not specified, the choice set is defined by the observed choices.If
ordered = TRUE,alternativesis assumed to be specified with the alternatives ordered from worst to best.- ordered
[
logical(1)]
IfTRUE, the choice setalternativesis assumed to be ordered from worst to best.- ranked
[
logical(1)]
Are the alternatives ranked?- base
[
character(1)]
The name of the base alternative for covariates that are not alternative specific (i.e. type 2 covariates and ASCs).Ignored and set to
NULLif the model has no alternative specific covariates (e.g. in the ordered probit model).By default,
baseis the last element ofalternatives.- id
[
character(1)]
The name of the column inchoice_datathat contains unique identifier for each decision maker.- idc
[
character(1)]
The name of the column inchoice_datathat contains unique identifier for each choice situation of each decision maker. By default, these identifier are generated by the order of appearance.- standardize
[
character()|"all"]
Names of covariates that get standardized.Covariates of type 1 or 3 have to be addressed by
<covariate>_<alternative>.If
standardize = "all", all covariates get standardized.- impute
A character that specifies how to handle missing covariate entries in
choice_data, one of:"complete_cases", removes all rows containing missing covariate entries (the default),"zero", replaces missing covariate entries by zero (only for numeric columns),"mean", imputes missing covariate entries by the mean (only for numeric columns).
Details
Requirements for the data.frame choice_data:
It must contain a column named
idwhich contains unique identifier for each decision maker.It can contain a column named
idcwhich contains unique identifier for each choice situation of each decision maker. If this information is missing, these identifier are generated automatically by the appearance of the choices in the data set.It can contain a column named
choicewith the observed choices, wherechoicemust match the name of the dependent variable inform. Such a column is required for model fitting but not for prediction.It must contain a numeric column named p_j for each alternative specific covariate p in
formand each choice alternative j inalternatives.It must contain a numeric column named q for each covariate q in
formthat is constant across alternatives.
In the ordered case (ordered = TRUE), the column choice must
contain the full ranking of the alternatives in each choice occasion as a
character, where the alternatives are separated by commas, see the examples.
See the vignette on choice data for more details.
See also
check_form()for checking the model formulaoverview_effects()for an overview of the model effectscreate_lagged_cov()for creating lagged covariatesas_cov_names()for re-labeling alternative-specific covariatessimulate_choices()for simulating choice datatrain_test()for splitting choice data into a train and test subset
Examples
data <- prepare_data(
form = choice ~ price + time + comfort + change | 0,
choice_data = train_choice,
re = c("price", "time"),
id = "deciderID",
idc = "occasionID",
standardize = c("price", "time")
)
#> Checking for missing covariates
### ranked case
choice_data <- data.frame(
"id" = 1:3, "choice" = c("A,B,C", "A,C,B", "B,C,A"), "cov" = 1
)
data <- prepare_data(
form = choice ~ 0 | cov + 0,
choice_data = choice_data,
ranked = TRUE
)
#> Checking for missing covariates
