Alternating optimization (AO) is an iterative process for optimizing a real-valued function jointly over all its parameters by alternating restricted optimization over parameter partitions.
Usage
ao(
f,
initial,
target = NULL,
npar = NULL,
gradient = NULL,
hessian = NULL,
...,
partition = "sequential",
new_block_probability = 0.3,
minimum_block_number = 1,
minimize = TRUE,
lower = NULL,
upper = NULL,
iteration_limit = Inf,
seconds_limit = Inf,
tolerance_value = 1e-06,
tolerance_parameter = 1e-06,
tolerance_parameter_norm = function(x, y) sqrt(sum((x - y)^2)),
tolerance_history = 1,
base_optimizer = Optimizer$new("stats::optim", method = "L-BFGS-B"),
verbose = FALSE,
hide_warnings = TRUE,
add_details = TRUE
)Arguments
- f
[
function]
Afunctionto be optimized, returning a singlenumericvalue.The first argument of
fshould be anumericof the same length asinitial, optionally followed by any other arguments specified by the...argument.If
fis to be optimized over an argument other than the first, or more than one argument, this has to be specified via thetargetargument.- initial
[
numeric()|list()]
The starting parameter values for the target argument(s).This can also be a
listof multiple starting parameter values, see details.- target
[
character()|NULL]
The name(s) of the argument(s) over whichfgets optimized.This can only be
numericarguments.Can be
NULL(default), then it is the first argument off.- npar
[
integer()]
The length(s) of the target argument(s).Must be specified if more than two target arguments are specified via the
targetargument.Can be
NULLif there is only one target argument, in which casenparis set to belength(initial).- gradient
[
function|NULL]
Optionally afunctionthat returns the gradient off.The function call of
gradientmust be identical tof.Ignored if
base_optimizerdoes not support custom gradient.- hessian
[
function|NULL]
Optionally afunctionthat returns the Hessian off.The function call of
hessianmust be identical tof.Ignored if
base_optimizerdoes not support custom Hessian.- ...
Additional arguments to be passed to
f(andgradient).- partition
[
character(1)|list()]
Defines the parameter partition, and can be either"sequential"for treating each parameter separately,"random"for a random partition in each iteration,"none"for no partition (which is equivalent to joint optimization),or a
listof vectors of parameter indices, specifying a custom partition for the AO process.
This can also be a
listof multiple partition definitions, see details.- new_block_probability
[
numeric(1)]
Only relevant ifpartition = "random".The probability for a new parameter block when creating a random partition.
Values close to 0 result in larger parameter blocks, values close to 1 result in smaller parameter blocks.
- minimum_block_number
[
integer(1)]
Only relevant ifpartition = "random".The minimum number of blocks in random partitions.
- minimize
[
logical(1)]
Minimize during the AO process?If
FALSE, maximization is performed.- lower, upper
[
numeric()|NULL]
Optionally lower and upper parameter bounds.Ignored if
base_optimizerdoes not support parameter bounds.- iteration_limit
[
integer(1)|Inf]
The maximum number of iterations through the parameter partition before the AO process is terminated.Can also be
Inffor no iteration limit.- seconds_limit
[
numeric(1)]
The time limit in seconds before the AO process is terminated.Can also be
Inffor no time limit.Note that this stopping criteria is only checked after a sub-problem is solved and not within solving a sub-problem, so the actual process time can exceed this limit.
- tolerance_value
[
numeric(1)]
A non-negative tolerance value. The AO process terminates if the absolute difference between the current function value and the one beforetolerance_historyiterations is smaller thantolerance_value.Can be
0for no value threshold.- tolerance_parameter
[
numeric(1)]
A non-negative tolerance value. The AO process terminates if the distance between the current estimate and the beforetolerance_historyiterations is smaller thantolerance_parameter.Can be
0for no parameter threshold.By default, the distance is measured using the euclidean norm, but another norm can be specified via the
tolerance_parameter_normargument.- tolerance_parameter_norm
[
function]
The norm that measures the distance between the current estimate and the one from the last iteration. If the distance is smaller thantolerance_parameter, the AO process is terminated.It must be of the form
function(x, y)for two vector inputsxandy, and return a singlenumericvalue. By default, the euclidean normfunction(x, y) sqrt(sum((x - y)^2))is used.- tolerance_history
[
integer(1)]
The number of iterations to look back to determine whethertolerance_valueortolerance_parameterhas been reached.- base_optimizer
[
Optimizer|list()]
AnOptimizerobject, which can be created viaOptimizer. It numerically solves the sub-problems.By default, the
optimoptimizer withmethod = "L-BFGS-B"is used.This can also be a
listof multiple base optimizers, see details.- verbose
[
logical(1)]
Print tracing details during the AO process?Not supported when using multiple processes, see details.
- hide_warnings
[
logical(1)]
Hide warnings during the AO process?- add_details
[
logical(1)]
Add details about the AO process to the output?
Value
A list with the following elements:
estimateis the parameter vector at termination.valueis the function value at termination.detailsis adata.framewith information about the AO process: For each iteration (columniteration) it contains the function value (columnvalue), parameter values (columns starting withpfollowed by the parameter index), the active parameter block (columns starting withbfollowed by the parameter index, where1stands for a parameter contained in the active parameter block and0if not), and computation times in seconds (columnseconds). Only available ifadd_details = TRUE.secondsis the overall computation time in seconds.stopping_reasonis a message why the AO process has terminated.
In the case of multiple processes, the output changes slightly, see details.
Details
Multiple processes
AO can suffer from local optima. To increase the likelihood of reaching the global optimum, you can specify:
multiple starting parameters
multiple parameter partitions
multiple base optimizers
Use the initial, partition, and/or base_optimizer arguments to provide
a list of possible values for each parameter. Each combination of initial
values, parameter partitions, and base optimizers will create a separate AO
process.
Output value
In the case of multiple processes, the output values refer to the optimal (with respect to function value) AO processes.
If add_details = TRUE, the following elements are added:
estimatesis alistof optimal parameters in each process.valuesis alistof optimal function values in each process.detailscombines details of the single processes and has an additional columnprocesswith an index for the different processes.seconds_eachgives the computation time in seconds for each process.stopping_reasonsgives the termination message for each process.processesgive details how the different processes were specified.
Parallel computation
By default, processes run sequentially. However, since they are independent,
they can be parallelized. To enable parallel computation, use the
{future} framework. For example, run the
following before the ao() call:
future::plan(future::multisession, workers = 4)Progress updates
When using multiple processes, setting verbose = TRUE to print tracing
details during AO is not supported. However, you can still track the progress
using the {progressr} framework.
For example, run the following before the ao() call:
progressr::handlers(global = TRUE)
progressr::handlers(
progressr::handler_progress(":percent :eta :message")
)Examples
# Example 1: Minimization of Himmelblau's function --------------------------
himmelblau <- function(x) (x[1]^2 + x[2] - 11)^2 + (x[1] + x[2]^2 - 7)^2
ao(f = himmelblau, initial = c(0, 0))
#> $estimate
#> [1] 3.584428 -1.848126
#>
#> $value
#> [1] 9.606386e-12
#>
#> $details
#> iteration value p1 p2 b1 b2 seconds
#> 1 0 1.700000e+02 0.000000 0.000000 0 0 0.000000000
#> 2 1 1.327270e+01 3.395691 0.000000 1 0 0.053527117
#> 3 1 1.743664e+00 3.395691 -1.803183 0 1 0.010316610
#> 4 2 2.847290e-02 3.581412 -1.803183 1 0 0.007916927
#> 5 2 4.687468e-04 3.581412 -1.847412 0 1 0.007097960
#> 6 3 7.368057e-06 3.584381 -1.847412 1 0 0.005732536
#> 7 3 1.164202e-07 3.584381 -1.848115 0 1 0.047724724
#> 8 4 1.893311e-09 3.584427 -1.848115 1 0 0.004749060
#> 9 4 9.153860e-11 3.584427 -1.848124 0 1 0.003654718
#> 10 5 6.347425e-11 3.584428 -1.848124 1 0 0.003559113
#> 11 5 9.606386e-12 3.584428 -1.848126 0 1 0.003583431
#>
#> $seconds
#> [1] 0.1478622
#>
#> $stopping_reason
#> [1] "change in function value between 1 iteration is < 1e-06"
#>
# Example 2: Maximization of 2-class Gaussian mixture log-likelihood --------
# target arguments:
# - class means mu (2, unrestricted)
# - class standard deviations sd (2, must be non-negative)
# - class proportion lambda (only 1 for identification, must be in [0, 1])
normal_mixture_llk <- function(mu, sd, lambda, data) {
c1 <- lambda * dnorm(data, mu[1], sd[1])
c2 <- (1 - lambda) * dnorm(data, mu[2], sd[2])
sum(log(c1 + c2))
}
set.seed(123)
ao(
f = normal_mixture_llk,
initial = runif(5),
target = c("mu", "sd", "lambda"),
npar = c(2, 2, 1),
data = datasets::faithful$eruptions,
partition = list("sequential", "random", "none"),
minimize = FALSE,
lower = c(-Inf, -Inf, 0, 0, 0),
upper = c(Inf, Inf, Inf, Inf, 1),
add_details = FALSE
)
#> $estimate
#> [1] 2.0186087 4.2733443 0.2356257 0.4370632 0.3484053
#>
#> $value
#> [1] -276.36
#>
#> $seconds
#> [1] 0.6870563
#>
#> $stopping_reason
#> [1] "change in function value between 1 iteration is < 1e-06"
#>
