Lag Operators and Filters

The lag operator on a sequence space is defined by \(L(x) \equiv \left\{ {{x_{t - 1}}} \right\}\). The symbol \(B\) (for backshift) is also commonly used for this. This is rather clearly a linear operator, that is, \(L(x + y) = L(x) + L(y)\) and \(L(\alpha x) = \alpha L(x)\) when \(\alpha\) is a scalar. When applied to doubly infinite sequences, it has an inverse \({L^{ - 1}}(x) = \left\{ {{x_{t + 1}}} \right\}\) which is known as the lead operator. We can define \({L^n} = LLL...L(x) = \left\{ {{x_{t - n}}} \right\}\). Combined with the definition of the lead operator, we have a consistent meaning for \({L^n}\), for any integer \(n\), positive or negative. It's also clear that \({L^n}{L^m} = {L^{n + m}}\), so powers of \(L\) commute.

Lag operators are usually combined into polynomials. The meaning of \({a_0} + {\alpha _1}L + \ldots + {\alpha _p}{L^p}\) is again rather clear as long as we understand that the first term is a constant times the identity sequence. A very handy property is that polynomials in the lag operator commute, so we can apply them to a sequence in whatever order we find convenient.

Applying a lag polynomial to a sequence is known as filtering it. While lag polynomials in various forms are crucial to most time series analysis, we're going to look first at just applying these to observed data. The point of filtering data is to isolate some aspect of the data in which we are particularly interested. The rather well-known Hodrick-Prescott filter (which is similar to, but not quite, a linear filter) is designed to isolate the trend component of the data, which usually is subtracted off to provide the cyclical part only. The X11 seasonal adjustment (again, not quite a linear filter) is designed to eliminate the seasonal part of the data, returning the trend plus cycle plus any irregular variation.

The two main classes of filters are smoothing filters and differencing filters. Smoothing filters are also known as lowpass filters. Smoothing filters will typically have a rather smooth set of coefficients for the lags in the polynomial. For instance, when weekly unemployment claims are announced on Fridays, two numbers are usually reported: the current week and a four week moving average. The four week moving average would be written as applying \((1/4)(1 + L + {L^2} + {L^3})\) to the data.

A key filter used in the X11 seasonal adjustment algorithm is the Henderson moving average, which designed to have certain optimality properties among all filters which pass 3rd degree polynomials through unchanged. It's hoped that the trend-cycle (which this is trying to isolate) will be, at least locally, similar to a 3rd degree polynomial, which will go through, while the much more volatile seasonal will be smoothed away. In contrast, differencing filters usually have "choppy" coefficient patterns. They're designed to eliminate the very smooth long-run movements to concentrate on the shorter run. The first difference filter is \((1 - L)\), the second difference is \({(1 - L)^2} = 1 - 2L + {L^2}\) and the seasonal difference is \((1 - {L^s})\). The first difference will eliminate (reduce to a constant) a linear trend, the second difference will do the same to a quadratic trend.

If the point of differencing is to get rid of the trend to isolate short-run movements, how is this different from smoothing to isolate the trend, and subtracting that from the data? The main difference is that most smoothing filters (HP and Henderson, for instance) are two-sided; one basic two-sided symmetric filter is applied to the bulk of the data range with adjustments at the ends to accommodate the lack of data on one side or the other. This may be fine if you're analyzing a fixed range of data. It is highly inconvenient, however, if you're intending to build a forecasting model. The differencing filters are all one-sided into the past, and thus aren't sneaking a peek at the future.

You need to exercise some care in working with combinations of filters, as the end result can be quite different from what you intended. Consider, for instance, the four period moving average of unemployment data. What happens if we take the difference of this week's average from last week's? This doesn't seem unreasonable. However, what we get is \(.25(1 - L)(1 + L + {L^2} + {L^3}) = .25(1 - {L^4})\). This now doesn't seem quite so sensible. It's the average weekly change of the value over the past four weeks. (Note that you can see this clearly by commuting the operators to get \(.25(1 + L + {L^2} + {L^3})(1 - L)\).). While not the most meaningless piece of information ever devised, it's hardly the ideal summary of information content of this week's data. (Much more sensible would be the difference between the current value and the four week moving average: \(1 - .25(1 + L + {L^2} + {L^3}) = .75 - .25L - .25{L^2} - .25{L^3}\)). The potentially odd behavior of a combination of difference and smoothing operations is known as the Slutzky effect. It's also possible for (mis)use of filters to cause confusion in the apparent timing relationship between two series. For instance, the first difference of a series will always seem to "lead" its level, in the sense that it hits peaks and troughs first, since growth has to slow before it stops. Friedman and Schwartz found that the changes in money "led" levels of money income (GNP); it was pointed out by Tobin(1970) that that could easily be an artifact of comparing a differenced series with a levels series.

Linear filters on discrete data can be written more generally as

\begin{equation} \sum\limits_{s = - \infty }^\infty {{a_s}{x_{t - s}}} \equiv {\left( {a * x} \right)_t} \end{equation}

which is known as the convolution of the sequences \(a\) and \(x\). Since the \(x\) series will, in general, represent data, we'll probably not going to be able to impose much about its tail behavior in the sequence other than some boundedness conditions, so for the infinite sum to make sense, we're going to have to assume, at minimum, that the \(a\) sequence is absolutely summable. The analogue for the continuous time process is also known as a convolution:

\begin{equation} \int_{s = - \infty }^\infty {a(s)x\left( {t - s} \right)\,ds} \end{equation}

Observed GDP, for instance, is obtained by a smoothing filter applied to continuous time output, where

\begin{equation} a(s) = \left\{ {\begin{array}{*{20}{c}} 1 & {0 \le s \le 1} \\ 0 & {o.w.} \\ \end{array}} \right\} \end{equation}

Finite degree polynomials are easy to define, and their properties can be verified fairly easily. A key question is how far we can push this relationship between polynomials in the lag operator and regular polynomials. The answer is: quite a ways. A question which will be of considerable interest is whether a lag polynomial is invertible, which means having an inverse in non-negative powers of \(L\). The first difference operator, for instance, is not. Apply \(1 - L\) to any constant sequence and you get a zero sequence. Apply any linear filter to a sequence of zeros and you get zeros, not our original data, so no inverse is possible. Consider instead \(1 - \alpha L\) where \(\left| \alpha \right| < 1\). Can we invert this as if this were a polynomial in complex numbers, by defining

\begin{equation} {(1 - \alpha L)^{ - 1}} = 1 + \alpha L + {\alpha ^2}{L^2}{\rm{ }} + ... + {\alpha ^n}{L^n}{\rm{ }} + ... \end{equation}

The answer is yes. Hamilton(1994) works this out for this specific case on page 28.

In general, the z-transform of a lag polynomial is what you get when you replace the lag operator \(L\) with the complex number \(z\). It's a fairly deep result in functional analysis that if a complex function \(f(z)\) is well-behaved (differentiable or holomorphic) on, inside and slightly outside the unit circle, then its power (Taylor) series expansion can be applied with \(L\) replacing \(z\) to define \(f(L)\). For instance, if you ever come up with a use for \(\exp (L)\), it can be computed by \(\sum\limits_{k = 0}^\infty {\frac{{{L^k}}}{{k!}}} \). If \(f\) is holomorphic on, slightly inside and slightly outside the unit circle, the Laurent expansion \(\sum\limits_{k = - \infty }^\infty {{c_k}{L^k}}\) can be used instead. The significance of the unit circle is that it forms the set of eigenvalues of the lag operator. For example, if we look at \(1 - \alpha L\), where now \(\alpha > 1\), this will not have an inverse in nonnegative powers because \(1 - \alpha z\) has a root at \(1/\alpha \), which is inside the unit circle, so \({(1 - \alpha z)^{ - 1}}\) will not be differentiable inside it. However, it does have an expansion into negative powers, since it has no problem on and outside the unit circle:

\begin{equation} {\left( {1 - \alpha z} \right)^{ - 1}} = - {\left( {\alpha z} \right)^{ - 1}}{\left( {1 - {\alpha ^{ - 1}}{z^{ - 1}}} \right)^{ - 1}} = - {\left( {\alpha z} \right)^{ - 1}}\sum\limits_{k = 0}^\infty {{\alpha ^{ - k}}{z^{ - k}}} = - \sum\limits_{k = 1}^\infty {{\alpha ^{ - k}}{z^{ - k}}} \end{equation}

Inverting a lag operator into polynomials in the leads is generally not going to be very helpful, though there are some applications for it. If lack of invertibility into positive powers is problematic, it turns out that, in some situations, we can use a "root flipper" to take care of this. The function \(\frac{{\bar \alpha - z}}{{1 - \alpha z}}\) maps the unit circle to itself while replacing a root inside the unit circle with one outside it or vice versa. These are called Blaschke factors. These can be useful, but aren't a cure-all for invertibility problems. If you filter data by \(1 - 2L\), no Blaschke factor will allow you to recover the data from the current and past values of the filtered series. These come into play when an estimation method doesn't (or can't) distinguish between forward and backward representations. When we're dealing with sequence space, mathematically there is no real difference between the positive and negative directions; the identification of this as time sequencing, with a past with observed data and a future with data yet to be seen, is something that we are imposing.