Statistics and Algorithms / Non-Linear Optimization /

Nonlinear Convergence

A non-linear optimization is considered “converged” when the change from one iteration to the next is “small.” RATS defines the change in a vector of parameters by taking the maximum across its elements of

\(\min \left( {\left| {\beta - \beta _0 } \right|/\left| {\beta _0 } \right|,\;\left| {\beta - \beta _0 } \right|} \right)\)

where \(\beta _0\) is the parameter’s value before this iteration. This uses the relative change for parameters which, on the previous iteration, had absolute values of 1 or larger, and absolute changes for smaller values. This is compared with the convergence criterion that you set (typically by the CVCRIT option). The default value of this is \(10^{–5}\) for most methods. A value much smaller than that (say \(10^{–8}\)) probably is unattainable given the precision of floating point calculations on most machines, and even if the estimation succeeds in meeting the tighter value, you are unlikely to see much effect on the final coefficients.

However, you need to be careful how you parameterize your function. In particular, if you have a parameter which has a natural scale which is quite small (say .0001), then a change of .000009 might have a substantial effect on the function value, but would be small enough to pass the convergence test. Rescaling the parameter (replacing it wherever it appears with .001 times itself, or scaling a variable it multiplies by .001, in some cases multiplying the dependent variable by constant like 100) can help here. Other options are to parameterize it in log or square root form if appropriate. These also help with the accuracy of numerical derivatives if RATS needs to take them.

RATS does allow you to set the optimizers so that they consider convergence achieved if the change in the function value is small. This is done using the instruction NLPAR with the CRITERION=VALUE option. Convergence on function value is usually quite a bit easier to achieve than convergence on coefficients. You should probably choose this only when just the optimized function value matters, and the coefficients themselves are of minor importance.