DENSITY Instruction

DENSITY( options ) series start end grid density

DENSITY estimates the density function for a series of data. This can be done using kernel methods or by binning and counting for a histogram.

Wizard

The Statistics>(Kernel) Density Wizard provides a dialog-driven interface to the DENSITY instruction.

Parameters

series	series for which you want to compute the density function
start, end	range to use. Defaults to the defined range of series
grid	(input or output) series of points at which the density is estimated
density	(output) series for the estimated density corresponding to the grid points. The grid and density series will be defined from entry 1 until the number of points in the grid. How the grid is set depends upon the GRID and MAXGRID options.

Options

TYPE=[EPANECHNIKOV]/TRIANGULAR/GAUSSIAN/LOGISTIC/FLAT/PARZEN/HISTOGRAM

COUNTS/[NOCOUNTS]

Determines the type of density function that will be estimated. If you use HISTOGRAM, the counts for each "bin" are normally divided by the number of data points times the width of the bin to return a density estimate. Use COUNTS if you just want the counts for each bin.

BANDWIDTH=kernel bandwidth

The default for BANDWIDTH is:

$0.79{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {N^{ - 1/5}}{\kern 1pt} IQR$

where IQR is the interquartile range of the series and N is the number of data points. This has some optimality properties, but in practice seems to be too narrow. You can use the SMOOTHING option to increase (or decrease it) without knowing what the value (which depends upon the data).

GRID=[AUTOMATIC]/INPUT

If AUTOMATIC, the grid series runs in equal steps from the 1%-ile to the 99%-ile of the input series. If INPUT, you fill in the grid series with whatever values you want prior to using the DENSITY instruction.

MAXGRID=number of grid points [100 except for TYPE=HISTOGRAM]

If GRID=AUTOMATIC (the default), MAXGRID gives the number of equally spaced points at which the density is estimated.

WEIGHT=series of weights for the data points [not used]

Use this option if you want to provide different weights for each observation.

DERIVATIVE=(output) series of estimated derivatives [not used]

This saves the estimated derivatives of the density function into a series. This matches up, point for point, with the grid and density series. It requires TYPE=GAUSSIAN, as the other kernel types aren’t differentiable.

SMOOTHING=smoothing scale vactor[1]

You can supply a real value (bigger than 0) to adjust the amount of smoothing. Use a value bigger than 1 for more smoothing than the default, values less than 1 for less smoothing.

SMPL=standard SMPL option [unused]

SHUFFLE=SERIES[INTEGER] with entry remapping[unused]

[PRINT]/NOPRINT

If PRINT, DENSITY produces a table of grid values and the estimated density at each point.

Description

For types other than HISTOGRAM, DENSITY estimates the density function for a series of data x, by computing at each point u in the grid:

$\hat f\left( u \right) = \frac{{\sum\limits_{t = 1}^T {\left( {{\kern 1pt} {w_t}K\left( {\frac{{u - {x_t}}}{h}} \right)} \right)} }}{{h{\kern 1pt} \sum\limits_{t = 1}^T {{w_t}} }}$

where K is the kernel function, h the bandwidth and w are the weights, which, by default, are 1 for all t. The kernel types take the following forms:

EPANECHNIKOV	$K\left( v \right) = 0.75{\kern 1pt} {\kern 1pt} \left( {1 - {v^2}} \right)$ if $\left\| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right\| \le 1$, 0 otherwise
TRIANGULAR	$K\left( v \right) = \left( {1 - \left\| v \right\|} \right)$ if $\left\| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right\| \le 1$, 0 otherwise
GAUSSIAN	$K\left( v \right) = \frac{1}{{\sqrt {2\pi } }}{\kern 1pt} {\kern 1pt} {\kern 1pt} \exp {\kern 1pt} {\kern 1pt} \left( {\frac{{ - {v^2}}}{2}} \right)$
LOGISTIC	$K\left( v \right) = \frac{{{e^v}}}{{{{\left( {1 + {e^v}} \right)}^2}}}$
FLAT	$K\left( v \right) = 0.5$ if $\left\| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right\| \le 1$, 0 otherwise
PARZEN	$K\left( v \right) = 4/3 - 8{v^2} + 8{\left\| v \right\|^3}$ if $\left\| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right\| \le 0.5$, $8\left( {1 - {{\left\| v \right\|}^3}} \right)/3$ if $0.5 \le v \le 1$

As you increase the bandwidth, you will get a smoother estimated density function, but you will be less able to detect sharp features. A shorter bandwidth leads to a more ragged estimated density function, but sharp features, such as a truncation at one end, will be more apparent.

For TYPE=HISTOGRAM, the grid becomes a series of “bins” centered at each grid point. DENSITY counts the number of data points which fall in each bin. If you use COUNTS, these raw counts will be the values returned. Otherwise, the counts are divided by the number of data points times the bin width to produce an estimate of the density.

Variables Defined

%EBW	the computed bandwidth (REAL)

Examples

This computes and graphs density functions for three sets of statistics generated by a simulation process. Each uses an automatic grid with the default 100 grid points.

density(smoothing=1.5) inter 1 ndraws ginter finter

density(smoothing=1.5) coeff1 1 ndraws gcoeff1 fcoeff1

density(smoothing=1.5) sums 1 ndraws gsums fsums

scatter(style=lines,window="Posterior for Intercept")

# ginter finter

scatter(style=lines,window="Posterior for Lag 0")

# gcoeff1 fcoeff1

scatter(style=lines,window="Posterior for Sum")

# gsums fsums

This computes a heavily smoothed density function across an input grid created using @GRIDSERIES.

@gridseries(from=-100,to=+100,n=400) xsr

density(smoothing=3.0,grid=input) sacratios 1 nboot xsr fsr

scatter(style=lines,header="Frequency Distribution for the Cecchetti Model")

# xsr fsr

This creates two density functions, one using a created grid, the other using the same grid as the first.

density(grid=automatic,maxgrid=100,smoothing=1.5) b2draws / bx fxn

scatter(style=line,vmin=0.0,$

footer="Figure 2.2 Distribution of b2 in the Monte Carlo experiment")

# bx fxn

density(grid=input,maxgrid=100,smoothing=1.5) b2drawsln / bx fxln

This computes a histogram for an income series using an input set of interval midpoints which give wider intervals for the higher incomes.

data(unit=input) 1 12 gridpts

5 15 25 35 45 55 65 85 105 135 165 235

density(type=histogram,grid=input,print) income / gridpts idensity

EPANECHNIKOV	\(K\left( v \right) = 0.75{\kern 1pt} {\kern 1pt} \left( {1 - {v^2}} \right)\) if \(\left\| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right\| \le 1\), 0 otherwise
TRIANGULAR	\(K\left( v \right) = \left( {1 - \left\| v \right\|} \right)\) if \(\left\| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right\| \le 1\), 0 otherwise
GAUSSIAN	\(K\left( v \right) = \frac{1}{{\sqrt {2\pi } }}{\kern 1pt} {\kern 1pt} {\kern 1pt} \exp {\kern 1pt} {\kern 1pt} \left( {\frac{{ - {v^2}}}{2}} \right)\)
LOGISTIC	\(K\left( v \right) = \frac{{{e^v}}}{{{{\left( {1 + {e^v}} \right)}^2}}}\)
FLAT	\(K\left( v \right) = 0.5\) if \(\left\| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right\| \le 1\), 0 otherwise
PARZEN	\(K\left( v \right) = 4/3 - 8{v^2} + 8{\left\| v \right\|^3}\) if \(\left\| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right\| \le 0.5\), \(8\left( {1 - {{\left\| v \right\|}^3}} \right)/3\) if \(0.5 \le v \le 1\)