Model selection involves the choice of a family for the state-dependent distribution and the selection of the number of states. This vignette1 introduces model selection in fHMM.
Information criteria
Common model selection tools are information criteria, such as the Akaike information criterion (AIC) or the Bayesian information criterion (BIC), both of which aim at finding a compromise between model fit and model complexity.
The AIC is defined as where denotes the number of parameters, while the BIC is defined as where is the total number of observations.
Challenges associated with model selection
In practice, however, information criteria often favor overly complex models. Real data typically exhibit more structure than can actually be captured by the model. This can be the case if the true state-dependent distributions are too complex to be fully modeled by some (rather simple) parametric distribution, or if certain temporal patterns are neglected in the model formulation. Additional states may be able to capture this structure, which can lead to an increased goodness of fit that outweighs the higher model complexity. However, as models with too many states are difficult to interpret and are therefore often not desired, information criteria should be treaten with some caution and only considered as a rough guidance. For an in-depth discussion of pitfalls, practical challenges, and pragmatic solutions regarding model selection, see Pohle et al. (2017).
The compare_models()
function
The fHMM package provides a convenient tool for
comparing different models via the compare_models()
function. The models (arbitrarily many) can be directly passed to the
compare_models()
function that returns an overview of the
above model selection criteria. Below, we compare a 2-state HMM with
normal state-dependent distributions with a 3-state HMM with
state-dependent t-distributions for the DAX data, where the more complex
model is clearly preferred:
data(dax_model_2n)
data(dax_model_3t)
compare_models(dax_model_2n, dax_model_3t)
#> parameters loglikelihood AIC BIC
#> dax_model_2n 6 17403.61 -34795.21 -34755.13
#> dax_model_3t 15 17650.02 -35270.05 -35169.85