Statistics and Algorithms / Non-Linear Optimization /

Hill-Climbing Algorithms

Hill-climbing algorithms are a very general collection of techniques for maximizing a twice-continuously differentiable function. A single iteration takes the following steps:

1.Compute the gradient \(\bf{g}\)

2.Select a direction by premultiplying the gradient by a matrix, that is \(\bf{d}=\bf{G}\bf{g}\). Make sure the directional derivative is positive, so that (at least locally) we are going "uphill" in that direction. If not, modify the direction.

3.Select a distance to travel in that direction: \({{\bf{x}}_{k + 1}} = {{\bf{x}}_k} + \lambda {\kern 1pt} {\kern 1pt} {\bf{d}}\)

When RATS needs to make an adjustment in the direction in step 2, it does it by adding in a multiple of the gradient. RATS is looking for a direction in which

\begin{equation} \frac{{{\bf{d}} \bullet {\bf{g}}}}{{\left\| {{\kern 1pt} {\bf{d}}{\kern 1pt} } \right\|{\kern 1pt} {\kern 1pt} {\kern 1pt} \left\| {{\kern 1pt} {\bf{g}}{\kern 1pt} } \right\|}} \ge \alpha \label{eq:optimize_wolf} \end{equation}

where \(\alpha\) is a control parameter. On most iterations, \(\alpha\) is zero. However, if RATS detects that the estimation isn't proceeding smoothly, it will switch to .00001 (by default), which you can reset using the ALPHA option on NLPAR. If the initial direction fails the test, RATS replaces it with \(\mathbf{d}+ \gamma \mathbf{g}\) where this is solved for the value of \(\gamma\) which just meets \eqref{eq:optimize_wolf}. While you might expect that making \(\alpha\) higher would speed up the process by picking a direction in which the function is "steeper", this is not the case. In fact, taking \(\bf{d}=\bf{g}\), which maximizes the left side of \eqref{eq:optimize_wolf}, gives us the infamous "steepest descent" (really "steepest ascent" for maximization problems) which is well known for producing very slow progress towards the maximum because of constant severe direction changes from iteration to iteration.

Subiteration (Line Search)

The process of selecting \(\lambda\) is called line search. RATS refers to the attempts to find a suitable \(\lambda\) as subiteration. There are two distinct philosophies which can be used in the line search stage. One is called exact (or perfect) line search, which maximizes the function over \(\lambda\). In inexact line search, we look for a value of \(\lambda\) at which a condition is satisfied which guarantees progress towards the maximum. The choice between these two methods is controlled by the EXACTLINESEARCH option on NLPAR. The default is NOEXACT. For this, RATS tests

\begin{equation} \delta \le \frac{{F\left( {{{\bf{x}}_{k + 1}}} \right){\kern 1pt} {\kern 1pt} {\kern 1pt} - F\left( {{{\bf{x}}_k}} \right)}}{{\lambda {\kern 1pt} {\kern 1pt} {\bf{d}} \bullet {\bf{g}}}} \le 1 - \delta \label{eq:optimum_linesearch} \end{equation}
where \(\delta\) (a number between 0 and .5) is controlled by the DELTA option on NLPAR. The middle part is the ratio between the directional arc derivative with respect to \(\lambda\) and the directional derivative itself. This condition is described in, for instance, Berndt, Hall, Hall and Hausman (1974). The choice of subiteration method rarely affects whether you are able to achieve convergence. It can affect how quickly you achieve it. The default (NOEXACT) is almost always faster if you have fewer than thirty parameters, because it reduces the number of function evaluations.

RATS applies the hill-climbing technique with four different methods of computing the multiplying matrix \(\bf{G}\). Some of these are specialized to particular functions and are thus not available in all situations. These four are Newton–Raphson, Gauss–Newton, BFGS, and BHHH.