Skip to contents

The {fHMM} package fits a hidden Markov model via the maximum-likelihood method, i.e. by numerically maximizing the likelihood function. This vignette1 derives the likelihood formula, discusses challenges associated with the likelihood maximization and presents the model estimation workflow.

Likelihood evaluation using the forward algorithm

Deriving the likelihood function of an hidden Markov model is part of the hierarchical case, hence the following only discusses the general case.

An HHMM can be treated as an HMM with two conditionally independent data streams; the coarse-scale observations on the one hand and the corresponding chunk of fine-scale observations connected to a fine-scale HMM on the other hand. To derive the likelihood of an HHMM, we start by computing the likelihood of each chunk of fine-scale observations being generated by each fine-scale HMM.

To fit the \(i\)-th fine-scale HMM, with model parameters \(\theta^{*(i)}=(\delta^{*(i)}, \Gamma^{*(i)},(f^{*(i,k)})_k)\) to the \(t\)-th chunk of fine-scale observations, which is denoted by \((X_{t,t^*})_{t^*}\), we consider the fine-scale forward probabilities \[\begin{align*} \alpha^{*(i)}_{k,t^*}=f^{*(i)}(X^*_{t,1},\dots,X^*_{t,t^*}, S^*_{t,t^*}=k), \end{align*}\] where \(t^*=1,\dots,T^*\) and \(k=1,\dots,N^*\). Using the fine-scale forward probabilities, the fine-scale likelihoods can be obtained from the law of total probability as \[\begin{align*} \mathcal{L}^\text{HMM}(\theta^{*(i)}\mid (X^*_{t,t^*})_{t^*})=\sum_{k=1}^{N^*}\alpha^{*(i)}_{k,T^*}. \end{align*}\] The forward probabilities can be calculated in a recursively as \[\begin{align*} \alpha^{*(i)}_{k,1} &= \delta^{*(i)}_k f^{*(i,k)}(X^*_{t,1}), \\ \alpha^{*(i)}_{k,t^*} &= f^{*(i,k)}(X^*_{t,t^*})\sum_{j=1}^{N^*}\gamma^{*(i)}_{jk}\alpha^{*(i)}_{j,t^*-1}, \quad t^*=2,\dots,T^*. \end{align*}\]

The transition from the likelihood function of an HMM to the likelihood function of an HHMM is straightforward: Consider the coarse-scale forward probabilities \[\begin{align*} \alpha_{i,t}=f(X_1,\dots,X_t,(X^*_{1,t^*})_{t^*},\dots,(X^*_{t,t^*})_{t^*}, S_t=i), \end{align*}\] where \(t=1,\dots,T\) and \(i=1,\dots,N\). The likelihood function of the HHMM results as \[\begin{align*} \mathcal{L}^\text{HHMM}(\theta,(\theta^{*(i)})_i\mid (X_t)_t,((X^*_{t,t^*})_{t^*})_t)=\sum_{i=1}^{N}\alpha_{i,T}. \end{align*}\] The coarse-scale forward probabilities can be calculated similarly by applying the recursive scheme \[\begin{align*} \alpha_{i,1} &= \delta_i \mathcal{L}^\text{HMM}(\theta^{*(i)}\mid (X^*_{1,t^*})_{t^*})f^{(i)}(X_1), \\ \alpha_{i,t} &= \mathcal{L}^\text{HMM}(\theta^{*(i)}\mid (X^*_{t,t^*})_{t^*}) f^{(i)}(X_t)\sum_{j=1}^{N}\gamma_{ji}\alpha_{j,t-1}, \quad t=2,\dots,T. \end{align*}\]

Challenges associated with the likelihood maximization

To account for parameter constraints associated with the transition probabilities (and potentially the parameters of the state-dependent distributions), we use parameter transformations. To ensure that the entries of the t.p.m.s fulfill non-negativity and the unity condition, we estimate unconstrained values \((\eta_{ij})_{i\neq j}\) for the non-diagonal entries of \(\Gamma\) and derive the probabilities using the multinomial logit link \[\begin{align*} \gamma_{ij}=\frac{\exp[\eta_{ij}]}{1+\sum_{k\neq i}\exp[\eta_{ik}]},~i\neq j \end{align*}\] rather than estimating the probabilities \((\gamma_{ij})_{i,j}\) directly. The diagonal entries result from the unity condition as \[\begin{align*} \gamma_{ii}=1-\sum_{j\neq i}\gamma_{ij}. \end{align*}\] Furthermore, variances are strictly positive, which can be achieved by applying an exponential transformation to the unconstrained estimator.

When numerically maximizing the likelihood using some Newton-Raphson-type method, we often face numerical under- or overflow, which can be addressed by maximizing the logarithm of the likelihood and incorporating constants in a conducive way (see Zucchini, MacDonald, and Langrock (2016) and Oelschläger and Adam (2021) for details).

As the likelihood is maximized with respect to a relatively large number of parameters, the obtained maximum can be a local rather than the global one. To avoid this problem, it is recommended to run the maximization multiple times from different, possibly randomly selected starting points, and to choose the model that corresponds to the highest likelihood (see Zucchini, MacDonald, and Langrock (2016) and Oelschläger and Adam (2021) for details).


For illustration, we fit a 3-state HMM with state-dependent t-distributions to the DAX data that we prepared in the previous vignette on data management:

controls <- list(
  states = 3,
  sdds   = "t",
  data   = list(file        = system.file("extdata", "dax.csv", package = "fHMM"),
                date_column = "Date",
                data_column = "Close",
                from        = "2000-01-01",
                to          = "2021-12-31",
                logreturns  = TRUE),
  fit    = list("runs" = 100)
controls <- set_controls(controls)
data <- prepare_data(controls)

The data object can be directly passed to the fit_model() function that numerically maximizes the model’s (log-) likelihood function runs = 100 times.2 This task can be parallelized by setting the ncluster argument.3

dax_model_3t <- fit_model(data, seed = 1, verbose = FALSE)

The estimated model is saved in the {fHMM} package and can be accessed as follows:


Calling the summary() method provides an overview of the model fit, whereas coef() returns a data frame of the estimated model coefficients:

#> Summary of fHMM model
#>   simulated hierarchy       LL       AIC       BIC
#> 1     FALSE     FALSE 16913.33 -33796.65 -33697.13
#> State-dependent distributions:
#> t() 
#> Estimates:
#>                   lb   estimate        ub
#> Gamma_2.1  1.394e-02  2.170e-02 3.357e-02
#> Gamma_3.1  1.710e-06  1.697e-06 1.671e-06
#> Gamma_1.2  1.659e-02  2.641e-02 4.179e-02
#> Gamma_3.2  9.359e-03  1.740e-02 3.213e-02
#> Gamma_1.3  1.259e-08  1.246e-08 1.227e-08
#> Gamma_2.3  2.790e-03  5.146e-03 9.431e-03
#> mu_1       9.610e-04  1.269e-03 1.576e-03
#> mu_2      -7.955e-04 -2.517e-04 2.920e-04
#> mu_3      -3.724e-03 -1.674e-03 3.765e-04
#> sigma_1    5.332e-03  5.774e-03 6.253e-03
#> sigma_2    1.269e-02  1.324e-02 1.381e-02
#> sigma_3    2.330e-02  2.563e-02 2.819e-02
#> df_1       3.964e+00  5.272e+00 7.012e+00
#> df_2       7.592e+04  7.593e+04 7.593e+04
#> df_3       5.501e+00  1.064e+01 2.058e+01
#> States:
#> decoded
#>    1    2    3 
#> 2177 2753  695 
#> Residuals:
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> -3.523637 -0.659763  0.010355 -0.002319  0.673969  3.918940

The estimated state-dependent distributions can be plotted as follows:

plot(dax_model_3t, plot_type = "sdds")

As mentioned above, the HMM likelihood function is prone to local optima. This effect can be visualized by plotting the log-likelihood value in the different optimization runs, where the best run is marked in red:

plot(dax_model_3t, plot_type = "ll")


Oelschläger, L., and T. Adam. 2021. “Detecting Bearish and Bullish Markets in Financial Time Series Using Hierarchical Hidden Markov Models.” Statistical Modelling.

Zucchini, W., I. L. MacDonald, and R. Langrock. 2016. “Hidden Markov Models for Time Series: An Introduction Using R, 2nd Edition.” Chapman and Hall/CRC.