NPREG Instruction |
NPREG(options) Yseries Xseries start end grid fit
NPREG does a non-parametric flexible fit for \({Y_t} = f\left( {{X_t}} \right)\) for a single \(Y\) and \(X\) series. There are three methods which can be chosen: the Nadaraya-Watson kernel estimator, lowess (locally weighted scatterplot smoother) and Nearest Neighbor.
Wizard
The Statistics—Nonparametric Regression Wizard provides a dialog-driven interface to the NPREG instruction.
Parameters
Yseries |
dependent variable |
Xseries |
explanatory variable |
start, end |
range to estimate, defaults to SMPL range if one has been set, otherwise the maximum range over which both series are defined. |
grid |
series of X values at which the fit is computed |
fit |
series of fitted values corresponding to the grid series |
Options
METHOD=[NADARAYA]/LOWESS/NN
METHOD=NADARAYA does the Nadaraya-Watson kernel estimator, METHOD=LOWESS does LOWESS, and METHOD=NN does nearest neighbor smoothing.
GRID=[AUTOMATIC]/INPUT
MAXGRID= maximum number of grid points for GRID=AUTOMATIC[100]
GRID=AUTOMATIC has NPREG generate the grid points for the fit. These range from the lowest to the highest values attained by the actual X series, with the number of points being given by the MAXGRID option. To control the points yourself, use GRID=INPUT, in which case the grid series should be filled in advance with your settings. Usually, an equally spaced grid is handy if you’re mainly interesting in examining the shape of the f function. If the NPREG is part of a more complex calculation, the grid series will usually be the X series itself.
TYPE=[EPANECHNIKOV]/TRIANGULAR/GAUSSIAN/LOGISTIC/FLAT/PARZEN
BANDWIDTH=kernel bandwidth
TYPE selects the kernel type for the Nadaraya-Watson estimator. BANDWIDTH specifies the bandwidth for the kernel. The default for BANDWIDTH is:
\(\left( {0.79{\kern 1pt} {\kern 1pt} IQR} \right)/{N^{{1 / 5}}}\)
where IQR is the interquartile range of the series and N is the number of data points.
SMOOTHING=smoothing scale vactor[1]
You can supply a real value (bigger than 0) to adjust the amount of smoothing. Use a value bigger than 1 for more smoothing than the default, values less than 1 for less smoothing.
FRACTION=fraction of data range included in a LOWESS/NN fit[.5]
The larger the value for this option, the “stiffer” is the function.
SMPL=Standard SMPL option [unused].
Omits from the estimation observations where the SMPL series or expression is zero or NA.
WEIGHT=Standard WEIGHT option [unused]
Use this option if you want to provide different weights for each observation.
Technical Information
The Nadaraya-Watson estimator (METHOD=NADARAYA) is:
\(\hat f\left( x \right) = \frac{{\sum {K\left( {\left( {{x_i} - x} \right)/h} \right){y_i}} }}{{\sum {K\left( {\left( {{x_i} - x} \right)/h} \right)} }}\)
where K is the kernel function and h is the bandwidth. The kernels have the forms:
EPANECHNIKOV |
\(K\left( v \right) = 0.75{\kern 1pt} {\kern 1pt} \left( {1 - {v^2}} \right)\) if \(\left| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right| \le 1\), 0 otherwise |
TRIANGULAR |
\(K\left( v \right) = \left( {1 - \left| v \right|} \right)\) if \(\left| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right| \le 1\), 0 otherwise |
GAUSSIAN |
\(K\left( v \right) = \frac{1}{{\sqrt {2\pi } }}{\kern 1pt} {\kern 1pt} {\kern 1pt} \exp {\kern 1pt} {\kern 1pt} \left( {\frac{{ - {v^2}}}{2}} \right)\) |
LOGISTIC |
\(K\left( v \right) = {e^v}/{\left( {1 + {e^v}} \right)^2}\) |
FLAT |
\(K\left( v \right) = 0.5\) if \(\left| {{\kern 1pt} {\kern 1pt} v{\kern 1pt} {\kern 1pt} } \right| \le 1\), 0 otherwise |
PARZEN |
\(K\left( v \right) = 4/3 - 8{v^2} + 8{\left| v \right|^3}{\textrm{ if }}\left| v \right| \le .5,8\left( {1 - {{\left| v \right|}^3}} \right)/3{\textrm{ if }}0.5 \le v \le 1\) |
As you increase the bandwidth, the estimated function becomes smoother, but is less able to detect sharp features. A shorter bandwidth leads to a more ragged estimated function, but sharp features will be more apparent.
LOWESS (METHOD=LOWESS) is computed by doing a weighted least squares regression of y on a constant and x, for the sample which includes the requested fraction of the data closest to each value of x. The weights are:
\({\left( {1 - {{\left( {\frac{{\left| {{x_i} - x} \right|}}{{D\left( x \right)}}} \right)}^3}} \right)^3}\)
where \(D(x)\) is the range of the sample X’s used in the fit at x. \(\hat f\left( x \right)\) is the intercept of this regression.
Finally, METHOD=NN takes the simple average of y for the requested fraction of the data closest to each value of x.
Examples
This estimates two flexible fit functions, one using LOWESS, the other using Nadaraya-Watson. This is from LOWESS.RPF.
npreg(method=lowess,frac=.33) y x / xv yv
scatter(style=dots,overlay=line,ovsame,header="LOWESS Fit, Frac=.33") 2
# x y
# xv yv
*
npreg(method=nadaraya,grid=input) y x / xv yn
scatter(style=dots,overlay=line,ovsame,header="Kernel Fit, Default Bandwidth") 2
# x y
# xv yn
This estimates a flexible fit using a input bandwidth with a logit kernel. It is from example GRN7P214.RPF from the Greene(2012) examples.
npreg(bandwidth=2000,type=logit) avgcost output 6 111 kxfit kyfit
scatter(style=dots,overlay=line,ovsame,$
footer="Figure 7.8 Non-Parametric Regression for AVGCOST",$
hlabel="Output",vlabel="E[AvgCost|O]") 2
# output avgcost 6 111
# kxfit kyfit
Copyright © 2025 Thomas A. Doan