Numerical Integrals

While RATS includes many built-in functions that are integrals that don't have known formulas using the elementary functions (exp, log, sin, etc.), these are functions which are important enough that algorithms for approximating them have been worked out. Examples are %CDF (for the Normal), %TTEST (for the Student-t), %GAMMAINC (for the gamma). %CDF, in particular, even has multiple branches, as it is often important to very accurately estimate extreme tail values (for instance, to estimate probit models).

In practice, however, one sometimes needs to approximate integrals that aren't as well-studied or may not even have a specific functional form, if the function values are computed using simulations. How best to do these depends upon:

1.How much you know about the behavior of the function.

2.What type of range of integration you have (definite or indefinite).

Definite Integrals

If you have a closed interval \([a,b]\) on which the function is well-behaved, you can use the relatively simple trapezoid rule or the (generally) more accurate, but somewhat fussier Simpson's rule. The former can be done using the %ITRAPEZOID(X,F) and the latter with %ISIMPSON(X,F). In both cases, X is a VECTOR of ordinates and F the VECTOR of corresponding function values:

1.X and F must have the same dimensions.

2.X must be in increasing order.

3.For %ISIMPSON, X must be equally spaced (%ITRAPEZOID can have unequal spacing).

For instance, the following computes an approximation to

\begin{equation} \int_0^1 {x^4 \left( {1 - x} \right)^6 dx} \end{equation}

using an equally-spaced grid with intervals .01. (This requires 101 ordinates to include both endpoints).

compute x=%seqrange(0.0,1.0,101)

dec vect f(%size(x))

ewise f(i)=x(i)^4*(1-x(i))^6

compute iapprox=%isimpson(x,f)

This has a known value as the Beta function at 5 and 7.

?#.####### "Approximate" iapprox "Exact" exp(%lnbeta(5,7))

which in this case, agree to at least four significant digits:

Approximate 0.0004329 Exact 0.0004329

Indefinite Integrals

Indefinite integrals can have either infinite integration limits or finite integration limits with an integrand that goes to infinity, or both. In some cases, it's possible to successfully approximate this by a finite interval if you know enough about the function to know that beyond a certain value, the integral will be (in effect) zero. (Remember, these are all approximations anyway, so a calculation which misses 1.e-7 in the tail can still be accurate to a tolerance of 1.e-5). However, there are a number of special formulas which are designed to approximate indefinite intervals by doing weighted sums of a specially modified function at specific ordinates. In statistics, a particularly important one is the Gauss-Hermite quadrature, which approximates

\begin{equation} \int_{ - \infty }^{ + \infty } {\exp \left( { - x^2 } \right)} f(x)dx \approx \sum\limits_{i = 1}^n {w_i^{(n)} f\left( {x_i^{(n)} } \right)} \label{eq:misc_gausshermite} \end{equation}

where the weights \({w_i^{(n)} }\) and sample points \({x_i^{(n)} }\) depend upon the choice of \(n\). The similarity between this and the expected value of \(f(X)\) where \(X\) has a Gaussian distribution is rather clear—all it requires is a fairly modest change of variables. The weights and ordinates can be computed using the @GaussHermite procedure.

For instance, the following computes \(EX^4 ,X \sim N(1,4)\). In general, if \(X \sim N\left( {\mu ,\sigma ^2 } \right)\), the transformation to produce the proper functional form is

\begin{equation} Z = \frac{{X - \mu }}{{\sigma \sqrt 2 }} \Rightarrow X = Z\sigma \sqrt 2 + \mu \Rightarrow dX = \sigma \sqrt 2 dZ \end{equation}

where the extra divisor of \({\sqrt 2 }\) is needed because the /2 is missing in the exp(...) in \eqref{eq:misc_gausshermite} compared with a Normal density. Including the required integrating constants, and substituting for \(X\) gives

\begin{equation} \int_{ - \infty }^{ + \infty } {\frac{{\left( {2\sigma z + \mu } \right)^4 }}{{\sqrt \pi }}\exp \left( { - z^2 } \right)\,} dz \end{equation}

A 13 term Gauss-Hermite approximation to this can be done with

@GaussHermite(n=13,w=w,x=z)

dec vect f(%size(z))

compute mu=1.0,sigma=2.0

ewise f(i)=(sqrt(2)*sigma*z(i)+mu)^4/sqrt(%pi)

compute iapprox=%dot(w,f)

compute iexact=mu^4+6*mu^2*sigma^2+3*sigma^4

?"Approximate" iapprox "Exact" iexact

Again, this can be computed exactly, and the 13 term approximation hits that very accurately. (In fact, when \(f\) is a 4th order polynomial, even \(n=3\) is theoretically exact). \(f\) needs to be a function which is reasonably well-approximated by a polynomial, at least over the range where \(exp(-x^2)\) isn't vanishingly small. So, for instance, the very smooth \(f(x) = \cos (x)\) will give almost identical results for most values of \(n \ge 5\), while the integral for the more rapidly varying \(f(x) = \cos (10 x)\) is poorly approximated.

dofor n = 5 11 21

@GaussHermite(n=n,w=w,x=x)

dec vect f(%size(x))

ewise f(i)=cos(x(i))

compute iapprox=%dot(w,f)

?"Approximation with" n "terms" iapprox

end dofor n

Gauss-Hermite integration is used to (approximately) evaluate the log likelihood for a Random Effects Probit model in REPROBIT.RPF.

Monte Carlo Integration

Monte Carlo integration is based upon the observation that, for instance,

\begin{equation} \int_{ - \infty }^{ + \infty } {\cos (x)\exp \left( { - x^2 } \right)\,} dx \label{eq:misc_numintmc} \end{equation}

done in the last example above is can be rearranged to

\begin{equation} \int_{ - \infty }^{ + \infty } {\sqrt \pi } \cos \left( x \right)\left\{ {\frac{1}{{\sqrt {2\pi } \sqrt {(1/2)} }}\exp \left( { - \frac{{x^2 }}{{2(1/2)}}} \right)} \right\}dx \label{eq:misc_numintmcexpand} \end{equation}

where the factor in {} is observed to be the density function for a \(N\left( {0,1/2} \right)\). So \eqref{eq:misc_numintmcexpand} can be interpreted as

\begin{equation} E\left[ {\sqrt \pi \cos \left( X \right)} \right],X \sim N\left( {0,1/2} \right) \end{equation}

If we take a sample \(\{ x_1 , \ldots ,x_n \} \) and compute the sample mean of \(\left\{ {\sqrt \pi \cos \left( {x_1 } \right), \ldots ,\sqrt \pi \cos (x_n )} \right\}\), we would expect that this would converge to the value of \eqref{eq:misc_numintmc} (by the Law of Large Numbers) and that we could bound the likely error using the Central Limit Theorem. This will, of course, require certain regularity conditions on the behavior of the "f" function, but certain (different) regularity conditions are required for the quadrature methods as well. For instance, an indefinite integral with one fixed and one infinite limit is very hard to handle with either of the previous methods because the sharp cutoff violates their regularity conditions, but not those for Monte Carlo integration.

There are different ways to handle the "bookkeeping" for this type of calculation. This shows two possibilities. The first just keeps a running total (in E) of the sample values for the cosine, since the sum is a sufficient statistic for the mean. The second generates the whole sample, first for XDRAWS, then for FDRAWS given XDRAWS, then uses SSTATS to compute the mean of FDRAWS.

compute ndraws=10000

compute e=0.0

do i=1,ndraws

compute x=%ran(sqrt(.5))

compute e=e+sqrt(%pi)*cos(x)

end

set xdraw 1 ndraws = %ran(sqrt(.5))

set fdraw 1 ndraws = sqrt(%pi)*cos(xdraw)

sstats(mean) 1 ndraws fdraw>>edraws

?e/ndraws edraws

Note that these estimates will be different from each other (they are using two different sets of random numbers), and will also be different from the Gauss-Hermite quadrature values. 10000 draws in this case will give about a +/- .01 accuracy, while quadrature is nearly exact, as this has an \(f\) function which is almost perfectly set up for the latter. However, if we change this to

\begin{equation} \int_{ - \infty }^{ + 1} {\cos (x)\exp \left( { - x^2 } \right)\,} dx \end{equation}

where we've capped the upper limit at 1, that can be done using either technique by using 0 as the function value whenever \(x>1\), Monte Carlo integration is still +/- .01 at 10000 draws, while the values for quadrature are very erratic, as quadrature can't deal easily with the sharp cutoff which can't be approximated by a polynomial, even at a high order.

dofor n = 5 11 21 31 71

@GaussHermite(n=n,w=w,x=x)

dec vect f(%size(x))

ewise f(i)=%if(x(i)<=1.0,cos(x(i)),0.0)

compute iapprox=%dot(w,f)

?"Approximation with" n "terms" iapprox

end dofor n

compute ndraws=10000

set xdraw 1 ndraws = %ran(sqrt(.5))

set fdraw 1 ndraws = %if(xdraw<=1.0,sqrt(%pi)*cos(xdraw),0.0)

sstats(mean) 1 ndraws fdraw>>edraws

?edraws

Monte Carlo integration becomes even more important when you get into multi-dimensional integrals, where extensions of quadrature are very limited. Various forms of Monte Carlo integration are the dominant method of analysis in modern Bayesian statistics.