The choice_data
object defines the choice data, it is a combination of
choice_responses
and choice_covariates
.
Usage
choice_data(
data_frame,
format = "wide",
column_choice = "choice",
column_decider = "deciderID",
column_occasion = NULL,
column_alternative = NULL,
column_ac_covariates = NULL,
column_as_covariates = NULL,
delimiter = "_",
cross_section = is.null(column_occasion),
choice_type = c("discrete", "ordered", "ranked")
)
generate_choice_data(
choice_effects,
choice_identifiers = generate_choice_identifiers(N = 100),
choice_covariates = NULL,
choice_parameters = NULL,
choice_preferences = NULL,
column_choice = "choice",
choice_type = c("auto", "discrete", "ordered", "ranked")
)
long_to_wide(
data_frame,
column_ac_covariates = NULL,
column_as_covariates = NULL,
column_choice = "choice",
column_alternative = "alternative",
column_decider = "deciderID",
column_occasion = NULL,
alternatives = unique(data_frame[[column_alternative]]),
delimiter = "_",
choice_type = c("discrete", "ordered", "ranked")
)
wide_to_long(
data_frame,
column_choice = "choice",
column_alternative = "alternative",
alternatives = NULL,
delimiter = "_",
choice_type = c("discrete", "ordered", "ranked")
)
Arguments
- data_frame
[
data.frame
]
Contains the choice data.- format
[
character(1)
]
Format ofdata_frame
. Use"wide"
when each row contains all alternatives of an occasion and"long"
when each row contains a single alternative.- column_choice
[
character(1)
]
Column name with the observed choices. In wide layout this column should contain a single value per observation: for discrete data the value is the label of the chosen alternative, for ordered data it is the ordered factor or integer score, and for ranked data it is omitted in favour of one column per alternative (seechoice_type
). In long layout the same column is evaluated once per alternative: discrete data must use a binary indicator (1 for the chosen alternative, 0 otherwise), ordered data repeats the ordinal value for every alternative, and ranked data stores the integer rank1:J
for each alternative within an observation. Set toNULL
when no observed choices are available (e.g., for purely covariate tables).- column_decider
[
character(1)
]
Column name with decider identifiers.- column_occasion
[
character(1)
|NULL
]
Column name with occasion identifiers. Set toNULL
in cross-sectional data.- column_alternative
[
character(1)
|NULL
]
Column name with alternative identifiers whenformat = "long"
.- column_ac_covariates
[
character()
|NULL
]
Column names with alternative-constant covariates.- column_as_covariates
[
character()
]
Column names ofdata_frame
with alternative-specific covariates.- delimiter
[
character(1)
]
Delimiter separating alternative identifiers from covariate names in wide format. May consist of one or more characters.- cross_section
[
logical(1)
]
Treat choice data as cross-sectional?- choice_type
[
character(1)
]
Requested response type. Use"auto"
(default) to infer the mode fromchoice_alternatives()
, or explicitly simulate"discrete"
,"ordered"
, or"ranked"
outcomes.- choice_effects
[
choice_effects
]
Achoice_effects
object describing the model.- choice_identifiers
[
choice_identifiers
]
Achoice_identifiers
object that provides the decider and occasion identifiers.- choice_covariates
[
choice_covariates
]
Covariates to include in the generated data.- choice_parameters
[
choice_parameters
]
Model parameters supplying utilities and covariance structures.- choice_preferences
[
choice_preferences
]
Decider-specific preference draws used for simulation.- alternatives
[
character(J)
]
Unique labels for the choice alternatives.
Details
choice_data()
acts as the main entry point for observed data. It accepts
either long or wide layouts and performs extensive validation before
returning a tidy tibble with consistent identifiers. Columns that refer to
the same alternative are aligned using delimiter
so that downstream helpers
can detect them automatically. When used with ranked or ordered choices the
function checks that rankings are complete and warns early about inconsistencies.
Internally the helper converts long inputs to wide format. This guarantees that subsequent steps (such as computing probabilities) receive the same structure regardless of the original layout and keeps the workflow concise.
generate_choice_data()
simulates choice data.wide_to_long()
andlong_to_wide()
transform to wide and long format.
The generated choice_data
object inherits a choice_type
attribute that
matches the requested simulation mode. Ordered alternatives (ordered = TRUE
)
yield ordered responses, unordered alternatives default to discrete
multinomial outcomes, and ranked simulations return complete rankings for
every observation.
See also
choice_responses()
, choice_covariates()
, and choice_identifiers()
for
the helper objects that feed into choice_data()
.
Examples
### simulate data from a multinomial probit model
choice_effects <- choice_effects(
choice_formula = choice_formula(
formula = choice ~ A | B, error_term = "probit",
random_effects = c("A" = "cn")
),
choice_alternatives = choice_alternatives(J = 3)
)
generate_choice_data(choice_effects)
#> # A tibble: 100 × 7
#> deciderID occasionID choice B A_A A_B A_C
#> * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 C 0.118 2.76 0.0465 0.578
#> 2 2 1 C -0.206 -1.91 0.862 -0.243
#> 3 3 1 C -2.27 0.0192 0.0296 0.550
#> 4 4 1 A 1.07 2.68 -0.361 0.213
#> 5 5 1 C -1.18 -0.665 1.11 -0.246
#> 6 6 1 C 0.489 -0.976 1.07 0.132
#> 7 7 1 A 1.34 -1.70 -1.47 0.284
#> 8 8 1 C 0.607 0.237 1.32 0.524
#> 9 9 1 A 1.92 -0.110 0.172 -0.0903
#> 10 10 1 C -0.548 1.30 0.749 0.556
#> # ℹ 90 more rows
### transform between long/wide format
long_to_wide(
data_frame = travel_mode_choice,
column_alternative = "mode",
column_decider = "individual"
)
#> # A tibble: 210 × 16
#> individual income size wait_plane wait_train wait_bus wait_car cost_plane
#> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 35 1 69 34 35 0 59
#> 2 2 30 2 64 44 53 0 58
#> 3 3 40 1 69 34 35 0 115
#> 4 4 70 3 64 44 53 0 49
#> 5 5 45 2 64 44 53 0 60
#> 6 6 20 1 69 40 35 0 59
#> 7 7 45 1 45 34 35 0 148
#> 8 8 12 1 69 34 35 0 121
#> 9 9 40 1 69 34 35 0 59
#> 10 10 70 2 69 34 35 0 58
#> # ℹ 200 more rows
#> # ℹ 8 more variables: cost_train <int>, cost_bus <int>, cost_car <int>,
#> # travel_plane <int>, travel_train <int>, travel_bus <int>, travel_car <int>,
#> # choice <chr>
wide_to_long(
data_frame = train_choice
)
#> # A tibble: 5,858 × 8
#> deciderID occasionID choice alternative price time change comfort
#> <int> <int> <int> <chr> <dbl> <dbl> <int> <fct>
#> 1 1 1 1 A 52.9 2.5 0 1
#> 2 1 1 0 B 88.1 2.5 0 1
#> 3 1 2 1 A 52.9 2.5 0 1
#> 4 1 2 0 B 70.5 2.17 0 1
#> 5 1 3 1 A 52.9 1.92 0 1
#> 6 1 3 0 B 88.1 1.92 0 0
#> 7 1 4 0 A 88.1 2.17 0 1
#> 8 1 4 1 B 70.5 2.5 0 0
#> 9 1 5 0 A 52.9 2.5 0 1
#> 10 1 5 1 B 70.5 2.5 0 0
#> # ℹ 5,848 more rows