The {fHMM} package allows for multiple hidden Markov model specifications, including different data transformations, state-dependent distributions, and a hierarchical model structure. This vignette1 outlines what and how specifications are possible.

library(fHMM)
#> Thanks for using {fHMM} version 1.0.3!
#> See https://loelschlaeger.de/fHMM for help.
#> Type 'citation("fHMM")' for citing this R package.

## The set_controls function

The {fHMM} philosophy is to start the modeling process by setting all data, model, and estimation specifications. This is done by defining a named list of controls and passing it to the set_controls() function. The function checks the specifications and returns an fHMM_controls object which stores all specifications and thereby provides required information for other {fHMM} functionalities.

## Example specifications

For demonstration, we list example specifications using data from the Deutscher Aktienindex DAX2 [@jan92]:

download_data(symbol = "^GDAXI", file = "dax.csv")
#> * symbol: ^GDAXI
#> * from: 1987-12-30
#> * to: 2022-09-30
#> * path: /Users/runner/work/fHMM/fHMM/vignettes/dax.csv

### HMMs for empirical data

The following lines of code specify a 3-state HMM with state-dependent t-distributions on the data in the file dax.csv. The dates are provided in the column called Date and the data in the column called Close. The logreturns = TRUE line transforms the index data to log-returns. The runs = 50 line sets the number of numerical optimization runs to 50.

controls <- list(
states = 3,
sdds   = "t",
data   = list(file        = "dax.csv",
date_column = "Date",
data_column = "Close",
logreturns  = TRUE),
fit    = list(runs        = 50)
)
set_controls(controls)
#> fHMM controls:
#> * hierarchy: FALSE
#> * data type: empirical
#> * number of states: 3
#> * sdds: t()
#> * number of runs: 50

### Simulated HMM data

The following specifies a 2-state HMM with state-dependent Gamma distributions, where the expectation values for state 1 and 2 are fixed to 0.5 and 2, respectively. The model will be fitted to 500 data points (horizon = 500), that are going to be simulated from this model specification.

controls <- list(
states  = 2,
sdds    = "gamma(mu = 0.5|2)",
horizon = 500
)
set_controls(controls)
#> fHMM controls:
#> * hierarchy: FALSE
#> * data type: simulated
#> * number of states: 2
#> * sdds: gamma(mu = 0.5|2)
#> * number of runs: 100

### Hierarchical HMMs

Specifying hierarchical HMMs is analogously, except that new parameters can be specified (for example period, see below) and some parameters now can be specified for both hierarchies.

controls <- list(
hierarchy = TRUE,
horizon   = c(100, 10),
sdds      = c("t(df = 1)", "t(df = Inf)"),
period    = "m"
)
set_controls(controls)
#> fHMM controls:
#> * hierarchy: TRUE
#> * data type: simulated
#> * number of states: 2 2
#> * sdds: t(df = 1) t(df = Inf)
#> * number of runs: 100

The help page of the set_controls() function provides an overview of all possible specifications.

?set_controls
 set_controls R Documentation

## Set and check controls

### Arguments

 controls A list of controls. Either none, all, or selected parameters can be specified. Unspecified parameters are set to default values (the values in brackets). If hierarchy = TRUE, parameters with a (*) must be a vector of length 2, where the first entry corresponds to the coarse-scale and the second entry to the fine-scale layer. hierarchy (FALSE): A boolean, set to TRUE for an hierarchical HMM. states (*) (2): The number of states of the underlying Markov chain. sdds (*) (“t(df = Inf)”): Specifying the state-dependent distribution, one of “t”, or “gamma” (the gamma distribution), or “lnorm” (the log-normal distribution). You can fix the parameters (mean mu, standard deviation codesigma, degrees of freedom df) of these distributions, e.g. “t(df = Inf)” or “gamma(mu = 0, sigma = 1)”, respectively. To fix different values of one parameter for different states, separate by “|”, e.g. “t(mu = -1|1)”. horizon (*) (100): A numeric, specifying the length of the time horizon. The first entry of horizon is ignored if data is specified. period (“m”): Only relevant if hierarchy = TRUE and horizon[2] = NA_integer_. In this case, it specifies a flexible, periodic fine-scale time horizon and can be one of “w” for a week, “m” for a month, “q” for a quarter, “y” for a year. data (NA): A list of controls specifying the data. If data = NA, data gets simulated. Otherwise: file (*): A character, the path to a .csv-file with financial data, which must have a column named date_column (with dates) and data_column (with financial data). date_column (*) (“Date”): A character, the name of the column in file with dates. Can be NA_character_ in which case consecutive integers are used as time points. data_column (*) (“Close”): A character, the name of the column in file with financial data. from (NA_character_): A character of the format “YYYY-MM-DD”, setting a lower data limit. No lower limit if from = NA_character_. Ignored if controls$$data$$date_column is NA. to (NA_character_): A character of the format “YYYY-MM-DD”, setting an upper data limit. No upper limit if from = NA_character_. Ignored if controls$$data$$date_column is NA_character_. logreturns (*) (FALSE): A boolean, if TRUE the data is transformed to log-returns. merge (function(x) mean(x)): Only relevant if hierarchy = TRUE. In this case, a function, which merges a numeric vector of fine-scale data x into one coarse-scale observation. For example, merge = function(x) mean(x) defines the mean of the fine-scale data as the coarse-scale observation, merge = function(x) mean(abs(x)) for the mean of the absolute values, merge = function(x) (abs(x)) for the sum of of the absolute values, merge = function(x) (tail(x,1)-head(x,1))/head(x,1) for the relative change of the first to the last fine-scale observation. fit: A list of controls specifying the model fitting: runs (100): An integer, setting the number of optimization runs. origin (FALSE): A boolean, if TRUE the optimization is initialized at the true parameter values. Only for simulated data. If origin = TRUE, this sets run = 1 and accept = 1:5. accept (1:3): An integer (vector), specifying which optimization runs are accepted based on the output code of nlm. gradtol (1e-6): A positive numeric value, passed on to nlm. iterlim (200): A positive integer, passed on to nlm. print.level (0): One of 0, 1, and 2, passed on to nlm. steptol (1e-6): A positive numeric value, passed on to nlm.