RATS 10.1
RATS 10.1

NPREG(options)  Yseries Xseries start end grid fit

NPREG does a non-parametric flexible fit for \({Y_t} = f\left( {{X_t}} \right)\) for a single \(Y\) and \(X\) series. There are three methods which can be chosen: the Nadaraya-Watson kernel estimator, lowess (locally weighted scatterplot smoother) and Nearest Neighbor.

Wizard

The Statistics—Nonparametric Regression Wizard provides a dialog-driven interface to the NPREG instruction.

Parameters

Yseries

dependent variable

Xseries

explanatory variable

start, end

range to estimate, defaults to SMPL range if one has been set, otherwise the maximum range over which both series are defined.

grid

series of X values at which the fit is computed

fit

series of fitted values corresponding to the grid series

Options

METHOD=[NADARAYA]/LOWESS/NN

METHOD=NADARAYA does the Nadaraya-Watson kernel estimator, METHOD=LOWESS does LOWESS, and METHOD=NN does nearest neighbor smoothing.
 

GRID=[AUTOMATIC]/INPUT

MAXGRID= maximum number of grid points for GRID=AUTOMATIC[100]

GRID=AUTOMATIC has NPREG generate the grid points for the fit. These range from the lowest to the highest values attained by the actual X series, with the number of points being given by the MAXGRID option. To control the points yourself, use GRID=INPUT, in which case the grid series should be filled in advance with your settings. Usually, an equally spaced grid is handy if you’re mainly interesting in examining the shape of the f function. If the NPREG is part of a more complex calculation, the grid series will usually be the X series itself.

 

TYPE=[EPANECHNIKOV]/TRIANGULAR/GAUSSIAN/LOGISTIC/FLAT/PARZEN

BANDWIDTH=kernel bandwidth

TYPE selects the kernel type for the Nadaraya-Watson estimator. BANDWIDTH specifies the bandwidth for the kernel. The default for BANDWIDTH is:

\(\left( {0.79{\kern 1pt} {\kern 1pt} IQR} \right)/{N^{{1 / 5}}}\)

where IQR is the interquartile range of the series and N is the number of data points.

 

SMOOTHING=smoothing scale vactor[1]

You can supply a real value (bigger than 0) to adjust the amount of smoothing. Use a value bigger than 1 for more smoothing than the default, values less than 1 for less smoothing.

 

FRACTION=fraction of data range included in a LOWESS/NN fit[.5]

The larger the value for this option, the “stiffer” is the function.

 

SMPL=Standard SMPL option  [unused].

Omits from the estimation observations where the SMPL series or expression is zero or NA.

 

WEIGHT=Standard WEIGHT option [unused]

Use this option if you want to provide different weights for each observation.

Technical Information

The Nadaraya-Watson estimator (METHOD=NADARAYA) is:

 

\(\hat f\left( x \right) = \frac{{\sum {K\left( {\left( {{x_i} - x} \right)/h} \right){y_i}} }}{{\sum {K\left( {\left( {{x_i} - x} \right)/h} \right)} }}\)

 

where K is the kernel function and h is the bandwidth. The kernels have the forms:

EPANECHNIKOV

\(K\left( v \right) = 0.75{\kern 1pt} {\kern 1pt} \left( {1 - {v^2}} \right)\) if \(\left| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right| \le 1\), 0 otherwise

TRIANGULAR

\(K\left( v \right) = \left( {1 - \left| v \right|} \right)\) if \(\left| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right| \le 1\), 0 otherwise

GAUSSIAN

\(K\left( v \right) = \frac{1}{{\sqrt {2\pi } }}{\kern 1pt} {\kern 1pt} {\kern 1pt} \exp {\kern 1pt} {\kern 1pt} \left( {\frac{{ - {v^2}}}{2}} \right)\)

LOGISTIC

\(K\left( v \right) = {e^v}/{\left( {1 + {e^v}} \right)^2}\)

FLAT

\(K\left( v \right) = 0.5\) if \(\left| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right| \le 1\), 0 otherwise

PARZEN

\(K\left( v \right) = 4/3 - 8{v^2} + 8{\left| v \right|^3}{\textrm{ if }}\left| v \right| \le .5,8\left( {1 - {{\left| v \right|}^3}} \right)/3{\textrm{  if }}0.5 \le v \le 1\)

 

As you increase the bandwidth, the estimated function becomes smoother, but is less able to detect sharp features. A shorter bandwidth leads to a more ragged estimated function, but sharp features will be more apparent.

 

LOWESS (METHOD=LOWESS) is computed by doing a weighted least squares regression of y on a constant and x, for the sample which includes the requested fraction of the data closest to each value of x. The weights are:

 

\({\left( {1 - {{\left( {\frac{{\left| {{x_i} - x} \right|}}{{D\left( x \right)}}} \right)}^3}} \right)^3}\)

 

where \(D(x)\) is the range of the sample X’s used in the fit at x. \(\hat f\left( x \right)\) is the intercept of this regression.

 

Finally, METHOD=NN takes the simple average of y for the requested fraction of the data closest to each value of x.

Examples

This estimates two flexible fit functions, one using LOWESS, the other using Nadaraya-Watson. This is from LOWESS.RPF.

 

npreg(method=lowess,frac=.33) y x / xv yv

scatter(style=dots,overlay=line,ovsame,header="LOWESS Fit, Frac=.33") 2

# x y

# xv yv

*

npreg(method=nadaraya,grid=input) y x / xv yn

scatter(style=dots,overlay=line,ovsame,header="Kernel Fit, Default Bandwidth") 2

# x y

# xv yn


 

This estimates a flexible fit using a input bandwidth with a logit kernel. It is from example GRN7P214.RPF from the Greene(2012) examples.

 

npreg(bandwidth=2000,type=logit) avgcost output 6 111 kxfit kyfit

scatter(style=dots,overlay=line,ovsame,$

  footer="Figure 7.8 Non-Parametric Regression for AVGCOST",$

  hlabel="Output",vlabel="E[AvgCost|O]") 2

# output avgcost 6 111

# kxfit  kyfit


 


Copyright © 2025 Thomas A. Doan