Non-Linear Optimization

There are a number of issues with which all non-linear optimization methods must deal. "Machine Precision Issues" discusses the level of accuracy that can be obtained for the calculations of non-linear functions. "Non-linear convergence" discusses how it is determined whether or not a non-linear optimization has achieved convergence.

The following table lists the general purpose optimization algorithms used in RATS. Note that there are also some specialized algorithms used for certain problems, such as the linear and quadratic programming algorithms used in LQPROG and the specialized optimizers used in LAD and quantile regressions in RREG. These are for more general optimization and most are found in many different instructions. There are also optimization techniques (such as EM (expectation-maximization)) which require specific calculations for specific problems. The algorithms below are more generally applicable.

These are divided into two main groups: hill-climbing algorithms (which assume at least second derivatives), and derivative-free methods, which are more flexible, but generally slower (often much slower) on well-behaved functions.

Method Name (in options)	Type of algorithm	Applications
BFGS	Derivative-based, uphill only	General optimization
BHHH	Derivative-based, uphill only	Likelihood-based only
GAUSS (-Newton)	Derivative-based, uphill only	Non-linear least squares only
NEWTON (-Raphson)	Derivative-based, uphill only	Specific models
SIMPLEX	Derivative-free, uphill only, narrow search	General optimization
GENETIC	Derivative-free, uphill only, broad-search	General optimization
ANNEALING	Derivative-free, up-and-down, broad search	General optimization
GA (Genetic Annealing)	Derivative-free, up-and-down , broad search	General optimization

Throughout this, we're assuming that we're talking about maximizing a function. Minimizing can be represented by maximizing minus the function.

The ideal optimization problem has a function which is continuously analytically differentiable (multiple times) and globally concave. (The log likelihood of the probit is a good example of that). If that's all we ever needed to do, we wouldn't need a range of optimization algorithms, with different settings within each. Instead, we have optimization problems with derivatives that have to be approximated numerically, or sometimes no derivatives at all. They can have multiple modes, or, even if there is only a single mode, they can fail to be globally concave. There can also be boundary issues, where the function either isn't computable outside a certain range (for instance, a variance can't be negative), or where the optimization problem itself requires constraints on certain parameters.

In general, derivative-based methods are faster, but are much more sensitive to choice of initial guesses and are limited in use only to models which are (at least) twice continuously differentiable.

The "uphill only" algorithms assume that you are starting on the right "hill" and move up from there, one hopes to the top. While it's possible for an early iteration to jump a considerable distance and actually switch "hills", you shouldn't expect it.

"Up-and-down" algorithms make no such assumptions. As a result, they are better able to handle surfaces with multiple modes. However, there is a considerable cost in time, since they may be working multiple actual modes, or may be looking just at different parts of the same hill—there's no way to tell based just upon the function value. The (simulated) ANNEALING and GA (genetic annealing) algorithms have the property that they can make "downhill" moves. This makes them particularly useful in dealing with multi-modal surfaces since they don't have to find a spot on another hill that has a higher function value than the current location (the uphill-only algorithms require finding a better function value to switch "hills" which generally requires a lucky, wild step). Needless to say, making both uphill and downhill moves takes considerably longer than making only uphill moves, so you should employ them only if you have a real concern that you might have a very ill-behaved surface.