Define choice data

The choice_data object defines the choice data, it is a combination of choice_responses and choice_covariates.

Usage

choice_data(
  data_frame,
  format = "wide",
  column_choice = "choice",
  column_decider = "deciderID",
  column_occasion = NULL,
  column_alternative = NULL,
  column_ac_covariates = NULL,
  column_as_covariates = NULL,
  delimiter = "_",
  cross_section = is.null(column_occasion),
  choice_type = c("discrete", "ordered", "ranked")
)

generate_choice_data(
  choice_effects,
  choice_identifiers = generate_choice_identifiers(N = 100),
  choice_covariates = NULL,
  choice_parameters = NULL,
  choice_preferences = NULL,
  column_choice = "choice",
  choice_type = c("auto", "discrete", "ordered", "ranked")
)

long_to_wide(
  data_frame,
  column_ac_covariates = NULL,
  column_as_covariates = NULL,
  column_choice = "choice",
  column_alternative = "alternative",
  column_decider = "deciderID",
  column_occasion = NULL,
  alternatives = unique(data_frame[[column_alternative]]),
  delimiter = "_",
  choice_type = c("discrete", "ordered", "ranked")
)

wide_to_long(
  data_frame,
  column_choice = "choice",
  column_alternative = "alternative",
  alternatives = NULL,
  delimiter = "_",
  choice_type = c("discrete", "ordered", "ranked")
)

Arguments

data_frame: [data.frame]
Contains the choice data.
format: [character(1)]
Format of data_frame. Use "wide" when each row contains all alternatives of an occasion and "long" when each row contains a single alternative.
column_choice: [character(1)]
Column name with the observed choices. In wide layout this column should contain a single value per observation: for discrete data the value is the label of the chosen alternative, for ordered data it is the ordered factor or integer score, and for ranked data it is omitted in favour of one column per alternative (see choice_type). In long layout the same column is evaluated once per alternative: discrete data must use a binary indicator (1 for the chosen alternative, 0 otherwise), ordered data repeats the ordinal value for every alternative, and ranked data stores the integer rank 1:J for each alternative within an observation. Set to NULL when no observed choices are available (e.g., for purely covariate tables).
column_decider: [character(1)]
Column name with decider identifiers.
column_occasion: [character(1) | NULL]
Column name with occasion identifiers. Set to NULL in cross-sectional data.
column_alternative: [character(1) | NULL]
Column name with alternative identifiers when format = "long".
column_ac_covariates: [character() | NULL]
Column names with alternative-constant covariates.
column_as_covariates: [character()]
Column names of data_frame with alternative-specific covariates.
delimiter: [character(1)]
Delimiter separating alternative identifiers from covariate names in wide format. May consist of one or more characters.
cross_section: [logical(1)]
Treat choice data as cross-sectional?
choice_type: [character(1)]
Requested response type. Use "auto" (default) to infer the mode from choice_alternatives(), or explicitly simulate "discrete", "ordered", or "ranked" outcomes.
choice_effects: [choice_effects]
A choice_effects object describing the model.
choice_identifiers: [choice_identifiers]
A choice_identifiers object that provides the decider and occasion identifiers.
choice_covariates: [choice_covariates]
Covariates to include in the generated data.
choice_parameters: [choice_parameters]
Model parameters supplying utilities and covariance structures.
choice_preferences: [choice_preferences]
Decider-specific preference draws used for simulation.
alternatives: [character(J)]
Unique labels for the choice alternatives.

Value

A tibble that inherits from choice_data.

Details

choice_data() acts as the main entry point for observed data. It accepts either long or wide layouts and performs validation before returning a tidy tibble with consistent identifiers. Columns that refer to the same alternative are aligned using delimiter so that downstream helpers can detect them automatically. When used with ranked or ordered choices the function checks that rankings are complete and warns about inconsistencies.

Internally the helper converts long inputs to wide format. This guarantees that subsequent steps (such as computing probabilities) receive the same structure regardless of the original layout and keeps the workflow concise.

generate_choice_data() simulates choice data.
wide_to_long() and long_to_wide() transform to wide and long format.

The generated choice_data object inherits a choice_type attribute for the requested simulation mode. Ordered alternatives (ordered = TRUE) yield ordered responses, unordered alternatives default to discrete multinomial outcomes, and ranked simulations return complete rankings for every observation.

Examples

### simulate data from a multinomial probit model
choice_effects <- choice_effects(
  choice_formula = choice_formula(
    formula = choice ~ A | B, error_term = "probit",
    random_effects = c("A" = "cn")
  ),
  choice_alternatives = choice_alternatives(J = 3)
)
generate_choice_data(choice_effects)
#> # A tibble: 100 × 7
#>    deciderID occasionID choice      B     A_A     A_B     A_C
#>  * <chr>     <chr>      <chr>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 1         1          C       0.118  2.76    0.0465  0.578 
#>  2 2         1          C      -0.206 -1.91    0.862  -0.243 
#>  3 3         1          C      -2.27   0.0192  0.0296  0.550 
#>  4 4         1          A       1.07   2.68   -0.361   0.213 
#>  5 5         1          C      -1.18  -0.665   1.11   -0.246 
#>  6 6         1          C       0.489 -0.976   1.07    0.132 
#>  7 7         1          A       1.34  -1.70   -1.47    0.284 
#>  8 8         1          C       0.607  0.237   1.32    0.524 
#>  9 9         1          A       1.92  -0.110   0.172  -0.0903
#> 10 10        1          C      -0.548  1.30    0.749   0.556 
#> # ℹ 90 more rows

### transform between long/wide format
long_to_wide(
  data_frame = travel_mode_choice,
  column_alternative = "mode",
  column_decider = "individual"
)
#> # A tibble: 210 × 16
#>    individual income  size wait_plane wait_train wait_bus wait_car cost_plane
#>         <int>  <int> <int>      <int>      <int>    <int>    <int>      <int>
#>  1          1     35     1         69         34       35        0         59
#>  2          2     30     2         64         44       53        0         58
#>  3          3     40     1         69         34       35        0        115
#>  4          4     70     3         64         44       53        0         49
#>  5          5     45     2         64         44       53        0         60
#>  6          6     20     1         69         40       35        0         59
#>  7          7     45     1         45         34       35        0        148
#>  8          8     12     1         69         34       35        0        121
#>  9          9     40     1         69         34       35        0         59
#> 10         10     70     2         69         34       35        0         58
#> # ℹ 200 more rows
#> # ℹ 8 more variables: cost_train <int>, cost_bus <int>, cost_car <int>,
#> #   travel_plane <int>, travel_train <int>, travel_bus <int>, travel_car <int>,
#> #   choice <chr>
wide_to_long(
  data_frame = train_choice
)
#> # A tibble: 5,858 × 8
#>    deciderID occasionID choice alternative price  time change comfort
#>        <int>      <int>  <int> <chr>       <dbl> <dbl>  <int> <fct>  
#>  1         1          1      1 A            52.9  2.5       0 1      
#>  2         1          1      0 B            88.1  2.5       0 1      
#>  3         1          2      1 A            52.9  2.5       0 1      
#>  4         1          2      0 B            70.5  2.17      0 1      
#>  5         1          3      1 A            52.9  1.92      0 1      
#>  6         1          3      0 B            88.1  1.92      0 0      
#>  7         1          4      0 A            88.1  2.17      0 1      
#>  8         1          4      1 B            70.5  2.5       0 0      
#>  9         1          5      0 A            52.9  2.5       0 1      
#> 10         1          5      1 B            70.5  2.5       0 0      
#> # ℹ 5,848 more rows

Usage

Arguments

Value

Details

See also

Examples