Instrumental Variables and Two-Stage Least Squares

Applicability

The RATS instructions LINREG, AR1 (autocorrelation correction), NLLS (non-linear least squares), PREGRESS (panel regression), SUR (linear systems estimation) and NLSYSTEM (non-linear systems estimation) all support instrumental variables estimation. For the first four, RATS does (a variant of) two-stage least squares; for the last two, it is done using the Generalized Method of Moments.

In all cases, you do the instrumenting by setting up the list of available instruments using the instruction INSTRUMENTS, and then including the INST option on the estimation instruction.

The Instruction INSTRUMENTS

RATS uses the INSTRUMENTS instruction to maintain the list of instruments. With smaller models, you probably need to set this just once. With larger simultaneous equations models, you may not have enough observations to use all the exogenous and predetermined variables. If so, you will probably need to change the list for each equation. Use the ADD or DROP options to make small changes to the list. NLSYSTEM has a special option (MASK), which can do a joint estimation with differing sets of instruments.

Note that you must set the instrument list before you do the estimation. A procedure in which the instruments depend upon the parameters being estimated cannot be done (easily) with RATS.

Note that no variable, not even the CONSTANT, is included in an instrument set automatically. Also note that there is no concept of a specific variable or set of variables “instrumenting out” a particular explanatory variable.

Technical Information

RATS does not, literally, do two sets of regressions, though the effect is the same as if it did. Instrumental variables for a linear regression is based upon the assumptions used in A General Framework:

\begin{equation} y_t = X_t \beta + u_t \label{eq:linreg_basereg} \end{equation}

\begin{equation} EZ_t ^\prime u_t = 0 \end{equation}

where $Z_t$ are the instruments. In solving

\begin{equation} \Theta _T \frac{1}{T}\sum {Z_t ^\prime \left( {y_t - X_t \beta } \right) = 0} \label{eq:linreg_ivcondition} \end{equation}

the weight matrix is chosen to be

\begin{equation} \Theta _T \frac{1}{T}\sum {Z_t ^\prime \left( {y_t - X_t \beta } \right) = 0} \end{equation}

which is the matrix of regression coefficients of the $X$’s on the $Z$’s. This gives

\begin{equation} \hat \beta = \left( {\left( {\sum {X_t ^\prime Z_t } } \right){\kern 1pt} {\kern 1pt} \left( {\sum {Z_t ^\prime } Z_t } \right)^{ - 1} \left( {\sum {Z_t ^\prime X_t } } \right)} \right)^{ - 1} \left( {\left( {\sum {X_t ^\prime Z_t } } \right)\,\left( {\sum {Z_t ^\prime } Z_t } \right)^{ - 1} \left( {\sum {Z_t ^\prime } y_t } \right)} \right) \end{equation}

The calculation of the covariance matrix depends upon the options chosen. If you don’t use ROBUSTERRORS, the assumption is made that

\begin{equation} E(u_t |Z_t ) = \sigma ^2 \end{equation}

which gives us the covariance matrix estimate

\begin{equation} \hat \sigma ^2 \left( {\left( {\sum {X_t ^\prime Z_t } } \right){\kern 1pt} {\kern 1pt} {\kern 1pt} \left( {\sum {Z_t ^\prime } Z_t } \right)^{ - 1} \left( {\sum {Z_t ^\prime X_t } } \right)} \right)^{ - 1} \end{equation}

More generally the covariance matrix will be estimated according to the formulas in "A General Framework".

If you use the SPREAD option with instrumental variables, all the formulas above are adjusted by inserting the reciprocal of the “spread” series into each of the sums:

\begin{equation} \Theta _T = \left( {\sum {X_t ^\prime \left( {{1 \mathord{\left/ {\vphantom {1 {V_t }}} \right.} {V_t }}} \right){\kern 1pt} Z_t } } \right)\,{\kern 1pt} \left( {\sum {Z_t ^\prime \left( {{1 \mathord{\left/ {\vphantom {1 {V_t }}} \right.} {V_t }}} \right)} Z_t } \right)^{ - 1} \end{equation}

with similar changes to all of the other sums.

Examples

Example KLEIN.RPF estimates Klein’s Model I. The endogenous explanatory variables are (current) PROFIT, PRIVWAGE and PROD (production). The exogenous and pre-determined variables from the model are CONSTANT, TREND, the policy variables GOVTWAGE, GOVTEXP and TAXES and lags of CAPITAL, PROD and PROFIT. The 2SLS estimates of the three structural equations are done with:

instruments constant trend govtwage taxes govtexp $

capital{1} prod{1} profit{1}

linreg(inst) cons

# constant profit{0 1} wagebill

linreg(inst) invst

# constant profit{0 1} capital{1}

linreg(inst) privwage

# constant prod{0 1} trend

INSTRUMENT.RPF estimates labor supply and labor demand functions using 2SLS. The HOURS and LWAGE (log wage) variables are assumed to be endogenous. EDUC, AGE, KIDSLT6, KIDSGE6, NWIFEINC, EXPER and EXPERSQ (experience and its square) are assumed to be exogenous and enter the hours or wage equation or both.

instruments constant educ age kidslt6 kidsge6 nwifeinc $

exper expersq

linreg(instruments) hours

# constant lwage educ age kidslt6 kidsge6 nwifeinc

linreg(instruments) lwage

# constant hours educ exper expersq