VolatilityEstimates.rpf shows various methods of estimating volatility from historical data. This uses the methods from Garman and Klass(1980). Note that these are relatively crude estimates which are designed for treating volatility as an "observable" for further analysis.

The data are daily data taken from Yahoo Finance on the U.S. Russell 2000 index. Non-trading days are handled by using mapped dates. The JDATE series set up by DATA is also used in the calculations to adjust for the gaps between consecutive entries.

open data "Russell 2000.csv"

data(format=prn,org=columns,julian=jdate) 1 412 open high low close adjclose volume

cal(julian=jdate)

All the price series are then transformed to logs.

dofor s = open high low close adjclose

set s = log(s)

end dofor s

The calculations are based upon the assumption that the log prices follow a (locally) constant variance non-drifting Brownian motion. What is being estimated is the variance at the daily time frame. Data available are the open, high, low and close prices and the volatility estimate for an entry can use any or all of them. (The more that are used, the more efficient the estimate). With this assumption, and using x = log price, for $$t>s$$

$$x(t) - x(s) \sim N\left( {0,\sigma ^2 (t - s)} \right)$$

which implies that

$$E\left[ {\frac{{\left( {x(t) - x(s)} \right)^2 }}{{(t - s)}}} \right] = \sigma ^2$$

so the squared change normalized by length is an unbiased estimator for the volatility.

The different estimates are labeled vol0 through vol6 following the labeling used in the paper. The simplest is what the paper calls the "classical" estimate, which just uses the change in the closing prices from one period to the next. It's possible that you won't have any other information for some assets; if so, this is the only estimate of this type which is available.

set vol0 = (close-close{1})^2

If you also have the opening price, you have two pieces of information: the overnight change, and the within-day change. Again, assuming the log price process is uniform through the day (even when markets are closed), the estimates from the two changes need to be normalized by the amount of the day that they represent. The authors use F to represent the fraction of the day when the market is closed; for the US stock market, this is 17.5 hours (open hours are 0930-1600 Eastern time) so that divided by 24.

compute f=17.5/24.0

Because of the properties of the Brownian motion, the estimates of the variance from the overnight and the within-day have the same "efficiency" so they are equally weighted:

compute a1=.5

set vol1 = a1*(open-close{1})^2/f+(1-a1)*(close-open)^2/(1-f)

If you also have the high and low within the period, there is a (substantially) more efficient estimator which can be had using those:

set vol2 = (high-low)^2/(4*log(2))

The overnight change provides independent estimation to the within-day range, so it can be combined with the within-day range to give a still more efficient estimator. The optimal linear combination heavily weights towards the latter:

compute a3=.17

set vol3 = a3*(open-close{1})^2/f+(1-a3)*(high-low)^2/(4*log(2))

The authors then derive an "optimal" use of all four pieces of information for the within-day estimates. They have two versions of this, the second being slightly simpler, and nearly as efficient:

set vol4 = c=(close-open),u=(high-open),d=(low-open),.511*(u-d)^2-.019*(c*(u+d)-2*u*d)-.383*c^2

set vol5 = c=(close-open),u=(high-open),d=(low-open),.5*(u-d)^2-(2*log(2)-1)*c^2

(This uses subcalculations for C, U and D within the SET to allow it to match their notation for the final calculation).

Finally, this combines the overnight estimate with the within-day estimate. The weights tilt even more strongly towards the within-day as it is more efficient than the high-low gap only:

compute a6=.12

set vol6 = a6*(open-close{1})^2/f+(1-a6)*vol4/(1-f)

Correcting for Spacing Between Observations

All of the calculations above were based upon the assumption that entries were equally spaced, which would be appropriate for (for instance) weekly data. With trading-day only data, that won't be realistic. Adjusting for the length of time the markets are closed requires adjusting any of the calculations which involve prices from different periods; thus VOL0, VOL1, VOL3 and VOL6. The other three use only within day data so aren't affected. For the calculations which use the open vs previous close, the time gap from the previous close is the number of days actually between them (thus 0 for consecutive days and 2 for Friday to Monday) plus the F period that is closed during a trading day.

The corrected calculations are:

set cvol0 = (close-close{1})^2/(jdate-jdate{1})

set cvol1 = a1*(open-close{1})^2/(jdate-jdate{1}-1+f)+(1-a1)*(close-open)^2/(1-f)

set cvol3 = a3*(open-close{1})^2/(jdate-jdate{1}-1+f)+(1-a3)*(high-low)^2/(4*log(2))

set cvol6 = a6*(open-close{1})^2/(jdate-jdate{1}-1+f)+(1-a6)*vol4/(1-f)

This graphs three of the corrected series over a relatively short range to show the differences. (If you graph the data over a long range, you won't see any detail).

graph(key=below,klabels=||"Classical","Open-Close","All Data"||) 3

# cvol0 51 75

# cvol1 51 75

# cvol6 51 75 Full Program

*
* VolatilityEstimates.rpf
* Shows various methods of estimated volatility from historical data.
*
* Methods from Garman, M. B. and M. J. Klass (1980), "On the Estimation
* of Security Price Volatilities from Historical Data", Journal of
* Business, vol 53, no 1, pp 67-78.

* Data from Yahoo Finance
*
open data "Russell 2000.csv"
data(format=prn,org=columns,julian=jdate) 1 412 open high low close adjclose volume
cal(julian=jdate)
*
* This is the fraction of the day the U.S. stock market is closed
*
compute f=17.5/24.0
*
* These calculations treat the entries as equally spaced (which is not
* true, as there are weekends and non-trading days). Calculations which
* take into account the spacing between days comes later.
*
set vol0 = (close-close{1})^2
*
* This is the optimal weight for combining the overnight change with the
* within-day change.
*
compute a1=.5
set vol1 = a1*(open-close{1})^2/f+(1-a1)*(close-open)^2/(1-f)
set vol2 = (high-low)^2/(4*log(2))
*
* This is the optimal weight for combining the overnight change with the
* daily high-low gap. (It's based upon the relative efficiencies of the
* two estimators).
*
compute a3=.17
set vol3 = a3*(open-close{1})^2/f+(1-a3)*(high-low)^2/(4*log(2))
*
* These are calculated using only the information from within a day.
* These are the "best" analytic scale-invariant estimators.
*
set vol4 = c=(close-open),u=(high-open),d=(low-open),.511*(u-d)^2-.019*(c*(u+d)-2*u*d)-.383*c^2
set vol5 = c=(close-open),u=(high-open),d=(low-open),.5*(u-d)^2-(2*log(2)-1)*c^2
*
* This combines the overnight volatility estimate with the within day estimate
*
compute a6=.12
set vol6 = a6*(open-close{1})^2/f+(1-a6)*vol4/(1-f)
*
* Correcting the overnight calculations for the date gap between entries
*
set cvol0 = (close-close{1})^2/(jdate-jdate{1})
set cvol1 = a1*(open-close{1})^2/(jdate-jdate{1}-1+f)+(1-a1)*(close-open)^2/(1-f)
set cvol6 = a6*(open-close{1})^2/(jdate-jdate{1}-1+f)+(1-a6)*vol4/(1-f)
*
graph(key=below,klabels=||"Classical","Open-Close","All Data"||) 3
# cvol0 51 75
# cvol1 51 75
# cvol6 51 75