The choice_data object defines the choice data, it is a combination of
choice_responses and choice_covariates.
Usage
choice_data(
data_frame,
format = "wide",
column_choice = "choice",
column_decider = "deciderID",
column_occasion = NULL,
column_alternative = NULL,
column_ac_covariates = NULL,
column_as_covariates = NULL,
delimiter = "_",
cross_section = is.null(column_occasion),
choice_type = c("discrete", "ordered", "ranked")
)
generate_choice_data(
choice_effects,
choice_identifiers = generate_choice_identifiers(N = 100),
choice_covariates = NULL,
choice_parameters = NULL,
choice_preferences = NULL,
column_choice = "choice",
choice_type = c("auto", "discrete", "ordered", "ranked")
)
long_to_wide(
data_frame,
column_ac_covariates = NULL,
column_as_covariates = NULL,
column_choice = "choice",
column_alternative = "alternative",
column_decider = "deciderID",
column_occasion = NULL,
alternatives = unique(data_frame[[column_alternative]]),
delimiter = "_",
choice_type = c("discrete", "ordered", "ranked")
)
wide_to_long(
data_frame,
column_choice = "choice",
column_alternative = "alternative",
alternatives = NULL,
delimiter = "_",
choice_type = c("discrete", "ordered", "ranked")
)Arguments
- data_frame
[
data.frame]
Contains the choice data.- format
[
character(1)]
Format ofdata_frame. Use"wide"when each row contains all alternatives of an occasion and"long"when each row contains a single alternative.- column_choice
[
character(1)]
Column name with the observed choices. In wide layout this column should contain a single value per observation: for discrete data the value is the label of the chosen alternative, for ordered data it is the ordered factor or integer score, and for ranked data it is omitted in favour of one column per alternative (seechoice_type). In long layout the same column is evaluated once per alternative: discrete data must use a binary indicator (1 for the chosen alternative, 0 otherwise), ordered data repeats the ordinal value for every alternative, and ranked data stores the integer rank1:Jfor each alternative within an observation. Set toNULLwhen no observed choices are available (e.g., for purely covariate tables).- column_decider
[
character(1)]
Column name with decider identifiers.- column_occasion
[
character(1)|NULL]
Column name with occasion identifiers. Set toNULLin cross-sectional data.- column_alternative
[
character(1)|NULL]
Column name with alternative identifiers whenformat = "long".- column_ac_covariates
[
character()|NULL]
Column names with alternative-constant covariates.- column_as_covariates
[
character()]
Column names ofdata_framewith alternative-specific covariates.- delimiter
[
character(1)]
Delimiter separating alternative identifiers from covariate names in wide format. May consist of one or more characters.- cross_section
[
logical(1)]
Treat choice data as cross-sectional?- choice_type
[
character(1)]
Requested response type. Use"auto"(default) to infer the mode fromchoice_alternatives(), or explicitly simulate"discrete","ordered", or"ranked"outcomes.- choice_effects
[
choice_effects]
Achoice_effectsobject describing the model.- choice_identifiers
[
choice_identifiers]
Achoice_identifiersobject that provides the decider and occasion identifiers.- choice_covariates
[
choice_covariates]
Covariates to include in the generated data.- choice_parameters
[
choice_parameters]
Model parameters supplying utilities and covariance structures.- choice_preferences
[
choice_preferences]
Decider-specific preference draws used for simulation.- alternatives
[
character(J)]
Unique labels for the choice alternatives.
Details
choice_data() acts as the main entry point for observed data. It accepts
either long or wide layouts and performs validation before
returning a tidy tibble with consistent identifiers. Columns that refer to
the same alternative are aligned using delimiter so that downstream helpers
can detect them automatically. When used with ranked or ordered choices the
function checks that rankings are complete and warns about inconsistencies.
Internally the helper converts long inputs to wide format. This guarantees that subsequent steps (such as computing probabilities) receive the same structure regardless of the original layout and keeps the workflow concise.
generate_choice_data()simulates choice data.wide_to_long()andlong_to_wide()transform to wide and long format.
The generated choice_data object inherits a choice_type attribute for
the requested simulation mode. Ordered alternatives (ordered = TRUE)
yield ordered responses, unordered alternatives default to discrete
multinomial outcomes, and ranked simulations return complete rankings for
every observation.
See also
choice_responses(), choice_covariates(), and choice_identifiers() for
the helper objects that feed into choice_data().
Examples
### simulate data from a multinomial probit model
choice_effects <- choice_effects(
choice_formula = choice_formula(
formula = choice ~ A | B, error_term = "probit",
random_effects = c("A" = "cn")
),
choice_alternatives = choice_alternatives(J = 3)
)
generate_choice_data(choice_effects)
#> # A tibble: 100 × 7
#> deciderID occasionID choice B A_A A_B A_C
#> * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 C 0.118 2.76 0.0465 0.578
#> 2 2 1 C -0.206 -1.91 0.862 -0.243
#> 3 3 1 C -2.27 0.0192 0.0296 0.550
#> 4 4 1 A 1.07 2.68 -0.361 0.213
#> 5 5 1 C -1.18 -0.665 1.11 -0.246
#> 6 6 1 C 0.489 -0.976 1.07 0.132
#> 7 7 1 A 1.34 -1.70 -1.47 0.284
#> 8 8 1 C 0.607 0.237 1.32 0.524
#> 9 9 1 A 1.92 -0.110 0.172 -0.0903
#> 10 10 1 C -0.548 1.30 0.749 0.556
#> # ℹ 90 more rows
### transform between long/wide format
long_to_wide(
data_frame = travel_mode_choice,
column_alternative = "mode",
column_decider = "individual"
)
#> # A tibble: 210 × 16
#> individual income size wait_plane wait_train wait_bus wait_car cost_plane
#> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 35 1 69 34 35 0 59
#> 2 2 30 2 64 44 53 0 58
#> 3 3 40 1 69 34 35 0 115
#> 4 4 70 3 64 44 53 0 49
#> 5 5 45 2 64 44 53 0 60
#> 6 6 20 1 69 40 35 0 59
#> 7 7 45 1 45 34 35 0 148
#> 8 8 12 1 69 34 35 0 121
#> 9 9 40 1 69 34 35 0 59
#> 10 10 70 2 69 34 35 0 58
#> # ℹ 200 more rows
#> # ℹ 8 more variables: cost_train <int>, cost_bus <int>, cost_car <int>,
#> # travel_plane <int>, travel_train <int>, travel_bus <int>, travel_car <int>,
#> # choice <chr>
wide_to_long(
data_frame = train_choice
)
#> # A tibble: 5,858 × 8
#> deciderID occasionID choice alternative price time change comfort
#> <int> <int> <int> <chr> <dbl> <dbl> <int> <fct>
#> 1 1 1 1 A 52.9 2.5 0 1
#> 2 1 1 0 B 88.1 2.5 0 1
#> 3 1 2 1 A 52.9 2.5 0 1
#> 4 1 2 0 B 70.5 2.17 0 1
#> 5 1 3 1 A 52.9 1.92 0 1
#> 6 1 3 0 B 88.1 1.92 0 0
#> 7 1 4 0 A 88.1 2.17 0 1
#> 8 1 4 1 B 70.5 2.5 0 0
#> 9 1 5 0 A 52.9 2.5 0 1
#> 10 1 5 1 B 70.5 2.5 0 0
#> # ℹ 5,848 more rows
