Skip to contents

The {fHMM} package allows for multiple hidden Markov model specifications, including different data transformations, state-dependent distributions, and a hierarchical model structure. This vignette1 outlines what and how specifications are possible.

library(fHMM)
#> Thanks for using {fHMM} version 1.0.3!
#> See https://loelschlaeger.de/fHMM for help.
#> Type 'citation("fHMM")' for citing this R package.

The set_controls function

The {fHMM} philosophy is to start the modeling process by setting all data, model, and estimation specifications. This is done by defining a named list of controls and passing it to the set_controls() function. The function checks the specifications and returns an fHMM_controls object which stores all specifications and thereby provides required information for other {fHMM} functionalities.

Example specifications

For demonstration, we list example specifications using data from the Deutscher Aktienindex DAX2 [@jan92]:

download_data(symbol = "^GDAXI", file = "dax.csv")
#> Download successful.
#> * symbol: ^GDAXI 
#> * from: 1987-12-30 
#> * to: 2022-09-30 
#> * path: /Users/runner/work/fHMM/fHMM/vignettes/dax.csv

HMMs for empirical data

The following lines of code specify a 3-state HMM with state-dependent t-distributions on the data in the file dax.csv. The dates are provided in the column called Date and the data in the column called Close. The logreturns = TRUE line transforms the index data to log-returns. The runs = 50 line sets the number of numerical optimization runs to 50.

controls <- list(
  states = 3,
  sdds   = "t",
  data   = list(file        = "dax.csv",
                date_column = "Date",
                data_column = "Close",
                logreturns  = TRUE),
  fit    = list(runs        = 50)
)
set_controls(controls)
#> fHMM controls:
#> * hierarchy: FALSE 
#> * data type: empirical 
#> * number of states: 3 
#> * sdds: t() 
#> * number of runs: 50

Simulated HMM data

The following specifies a 2-state HMM with state-dependent Gamma distributions, where the expectation values for state 1 and 2 are fixed to 0.5 and 2, respectively. The model will be fitted to 500 data points (horizon = 500), that are going to be simulated from this model specification.

controls <- list(
  states  = 2,
  sdds    = "gamma(mu = 0.5|2)",
  horizon = 500
)
set_controls(controls)
#> fHMM controls:
#> * hierarchy: FALSE 
#> * data type: simulated 
#> * number of states: 2 
#> * sdds: gamma(mu = 0.5|2) 
#> * number of runs: 100

Hierarchical HMMs

Specifying hierarchical HMMs is analogously, except that new parameters can be specified (for example period, see below) and some parameters now can be specified for both hierarchies.

controls <- list(
  hierarchy = TRUE,
  horizon   = c(100, 10),
  sdds      = c("t(df = 1)", "t(df = Inf)"),
  period    = "m"
)
set_controls(controls)
#> fHMM controls:
#> * hierarchy: TRUE 
#> * data type: simulated 
#> * number of states: 2 2 
#> * sdds: t(df = 1) t(df = Inf) 
#> * number of runs: 100

The help page of the set_controls() function provides an overview of all possible specifications.

?set_controls
set_controls R Documentation

Set and check controls

Arguments

controls

A list of controls. Either none, all, or selected parameters can be specified. Unspecified parameters are set to default values (the values in brackets). If hierarchy = TRUE, parameters with a (*) must be a vector of length 2, where the first entry corresponds to the coarse-scale and the second entry to the fine-scale layer.

  • hierarchy (FALSE): A boolean, set to TRUE for an hierarchical HMM.

  • states (*) (2): The number of states of the underlying Markov chain.

  • sdds (*) (“t(df = Inf)”): Specifying the state-dependent distribution, one of “t”, or “gamma” (the gamma distribution), or “lnorm” (the log-normal distribution). You can fix the parameters (mean mu, standard deviation
    codesigma, degrees of freedom df) of these distributions, e.g. “t(df = Inf)” or “gamma(mu = 0, sigma = 1)”, respectively. To fix different values of one parameter for different states, separate by “|”, e.g. “t(mu = -1|1)”.

  • horizon (*) (100): A numeric, specifying the length of the time horizon. The first entry of horizon is ignored if data is specified.

  • period (“m”): Only relevant if hierarchy = TRUE and horizon[2] = NA_integer_. In this case, it specifies a flexible, periodic fine-scale time horizon and can be one of

    • “w” for a week,

    • “m” for a month,

    • “q” for a quarter,

    • “y” for a year.

  • data (NA): A list of controls specifying the data. If data = NA, data gets simulated. Otherwise:

    • file (*): A character, the path to a .csv-file with financial data, which must have a column named date_column (with dates) and data_column (with financial data).

    • date_column (*) (“Date”): A character, the name of the column in file with dates. Can be NA_character_ in which case consecutive integers are used as time points.

    • data_column (*) (“Close”): A character, the name of the column in file with financial data.

    • from (NA_character_): A character of the format “YYYY-MM-DD”, setting a lower data limit. No lower limit if from = NA_character_. Ignored if controls\(data\)date_column is NA.

    • to (NA_character_): A character of the format “YYYY-MM-DD”, setting an upper data limit. No upper limit if from = NA_character_. Ignored if controls\(data\)date_column is NA_character_.

    • logreturns (*) (FALSE): A boolean, if TRUE the data is transformed to log-returns.

    • merge (function(x) mean(x)): Only relevant if hierarchy = TRUE. In this case, a function, which merges a numeric vector of fine-scale data x into one coarse-scale observation. For example,

      • merge = function(x) mean(x) defines the mean of the fine-scale data as the coarse-scale observation,

      • merge = function(x) mean(abs(x)) for the mean of the absolute values,

      • merge = function(x) (abs(x)) for the sum of of the absolute values,

      • merge = function(x) (tail(x,1)-head(x,1))/head(x,1) for the relative change of the first to the last fine-scale observation.

  • fit: A list of controls specifying the model fitting:

    • runs (100): An integer, setting the number of optimization runs.

    • origin (FALSE): A boolean, if TRUE the optimization is initialized at the true parameter values. Only for simulated data. If origin = TRUE, this sets run = 1 and accept = 1:5.

    • accept (1:3): An integer (vector), specifying which optimization runs are accepted based on the output code of nlm.

    • gradtol (1e-6): A positive numeric value, passed on to nlm.

    • iterlim (200): A positive integer, passed on to nlm.

    • print.level (0): One of 0, 1, and 2, passed on to nlm.

    • steptol (1e-6): A positive numeric value, passed on to nlm.