Chapter Objectives
By the end of this chapter, the reader should expect to accomplish the following:
Explain and analyze linear autoregressive models;
Understand the classical approaches to identifying, fitting, and diagnosing autoregressive models;
Apply simple heteroscedastic regression techniques to time series data;
Understand how exponential smoothing can be used to predict and filter time series; and
Project multivariate time series data onto lower dimensional spaces with principal component analysis.
Autoregressive Modeling
Preliminaries
Before we can build a model to predict Yt , we recall some basic definitions and terminology, starting with a continuous time setting and then continuing thereafter solely in a discrete-time setting.
Stochastic Process
A stochastic process is a sequence of random variables, indexed by continuous time: {Yt}t=−∞∞.
Time Series
A time series is a sequence of observations of a stochastic process at discrete times over a specific interval: {yt}t=1n.
Autocovariance
The j th autocovariance of a time series is γjt:=E[(yt−μt)(yt−j−μt−j)]
where μt:=E[yt].
Covariance (Weak) Stationarity
A time series is weak (or wide-sense) covariance stationary if it has time constant mean and autocovariances of all orders: μtγjt=μ,=γj,∀t∀t.
Autocorrelation
The j th autocorrelation, τj is just the j th autocovariance divided by the variance: τj=γ0γj.
White Noise
White noise, ϵt, is i.i.d. error which satisfies all three conditions:
a. E[ϵt]=0,∀t;
b. V[ϵt]=σ2,∀t; and
c. ϵt and ϵs are independent, t=s,∀t,s.
Gaussian white noise just adds a normality assumption to the error.
White noise error is often referred to as a "disturbance," "shock," or "innovation" in the financial econometrics literature.
Autoregressive Processes
describing yt as a linear combination of p past observations and white noise
AR(p) Process
The p th order autoregressive process of a variable Yt depends only on the previous values of the variable plus a white noise disturbance term yt=μ+i=1∑pϕiyt−i+ϵt,
where ϵt is independent of {yt−1}i=1p. We refer to μ as the drift term. p is referred to as the order of the model.
polynomial function ϕ(L):=(1−ϕ1L−ϕ2L2−⋯−ϕpLp)
yt−j is a j th lagged observation of yt given by the Lag operator or Backshift operator, yt−j=Lj[yj].
The AR(p) process can be expressed in the more compact form ϕ(L)[yt]=μ+ϵt
Stability
whether past disturbances exhibit an inclining or declining impact on the current value of y as the lag increases
consider the AR(1) process and write yt in terms of the inverse of Φ(L) yt=Φ−1(L)[μ+ϵt],
for an AR(1) process yt=1−ϕL1[μ+ϵt]=j=0∑∞ϕjLj[μ+ϵt]
the infinite sum will be stable, i.e. the ϕj terms do not grow with j, provided that ∣ϕ∣<1.
unstable AR(p) processes exhibit the counter-intuitive behavior that the error disturbance terms become increasingly influential as the lag increases.
We can calculate the Impulse Response Function (IRF), ∂ϵt−j∂yt∀j, to characterize the influence of past disturbances.
For the AR(p) model, the IRF is given by ϕj and hence is geometrically decaying when the model is stable.
Stationarity
A sufficient condition for the autocorrelation function of AR(p) models convergences to zero as the lag increases is stationary.
Φ(z)=(1−λ1z)⋅(1−λ2z)⋅…⋅(1−λpz)=0
a AR(p) model is strictly stationary and ergodic if all the roots lie outside the unit sphere in the complex plane C.
That is ∣λi∣>1,i∈{1,…,p} and ∣⋅∣ is the modulus of a complex number.
if the characteristic equation has at least one unit root, with all other roots lying outside the unit sphere, then this is a special case of non-stationarity but not strict stationarity.
Stationarity of Random Walk
We can show that the following random walk (zero mean AR(1) process) is not strictly stationary: yt=yt−1+ϵt
Written in compact form gives Φ(L)[yt]=ϵt,Φ(L)=1−L
and the characteristic polynomial, Φ(z)=1−z=0, implies that the real root z=1. Hence the root is on the unit circle and the model is a special case of nonstationarity.
Finding roots of polynomials is equivalent to finding eigenvalues.
The CayleyHamilton theorem states that the roots of any polynomial can be found by turning it into a matrix and finding the eigenvalues.
Given the p degree polynomial: q(z)=c0+c1z+…+cp−1zp−1+zp
we define the p×pcompanion matrix C:=⎝⎛00⋮0−c010⋱0−c101⋱0…⋯0⋱0−cp−20⋮⋮1−cp−1⎠⎞
the characteristic polynomial det(C−λI)=q(λ), and so the eigenvalues of C are the roots of q.
Partial Autocorrelations
The order, p, of a AR(p) model can be determined from time series data provided the data is stationary.
This signature encodes the memory in the model and is given by "partial autocorrelations."
each partial autocorrelation measures the correlation of a random variable, yt, with its lag, yt−h, while controlling for intermediate lags.
Partial Autocorrelation
A partial autocorrelation at lag h≥2 is a conditional autocorrelation between a variable, yt, and its h th lag, yt−hunder the assumption that the values of the intermediate lags, yt−1,…,yt−h+1 are controlled: τ~h:=τ~t,t−h:=γ~t,hγ~t−h,hγ~h,
where γ~h:=γ~t,t−h:=E[yt−P(yt∣yt−1,…,yt−h+1),yt−h−P(yt−h∣yt−1,…yt−h+1)]
is the lag- h partial autocovariance, P(W∣Z) is an orthogonal projection of W onto the set Z and γ~t,h:=E[(yt−P(yt∣yt−1,…,yt−h+1))2].
The partial autocorrelation function τ~h:N→[−1,1] is a map h:↦τ~h. The plot of τ~h against h is referred to as the partial correlogram.
Maximum Likelihood Estimation
The exact likelihood when the density of the data is independent of (ϕ,σn2) is L(y,x;ϕ,σn2)=t=1∏TfYt∣Xt(yt∣xt;ϕ,σn2)fXt(xt)
the exact likelihood is proportional to the conditional likelihood function: L(y,x;ϕ,σn2)∝L(y∣x;ϕ,σn2)=t=1∏TfYt∣Xt(yt∣xt;ϕ,σn2)=(σn22π)−T/2exp{−2σn21t=1∑T(yt−ϕTxt)2}
In many cases such an assumption about the independence of the density of the data and the parameters is not warranted.
the zero mean AR(1) with unknown noise variance: yt=ϕyt−1+ϵt,ϵt∼N(0,σn2)Yt∣Yt−1∼N(ϕyt−1,σn2)Y1∼N(0,1−ϕ2σn2).
The exact likelihood is L(x;ϕ,σn2)=fYt∣Yt−1(yt∣yt−1;ϕ,σn2)fY1(y1;ϕ,σn2)=(1−ϕ2σn22π)−1/2exp{−2σn21−ϕ2y12}(σn22π)−2T−1exp{−2σn21t=2∑T(yt−ϕyt−1)2}
Heteroscedasticity
heteroscedastic AR(p) model yt=μ+i=1∑pϕiyt−i+ϵt,ϵt∼N(0,σn,t2).
the ARCH test: The ARCH Engle’s test is constructed based on the property that if the residuals are heteroscedastic, the squared residuals are autocorrelated. The Ljung–Box test is then applied to the squared residuals
The estimation procedure for heteroscedastic models
estimation of the errors from the maximum likelihood function which treats the errors as independent
estimation of model parameters under a more general maximum likelihood estimation which treats the errors as time-dependent.
The conditional likelihood is L(y∣X;ϕ,σn2)=t=1∏TfYt∣Xt(yt∣xt;ϕ,σn2)=(2π)−T/2det(D)−1/2exp{−21(y−ϕTX)TD−1(y−ϕTX)}
where Dtt=σn,t2 is the diagonal covariance matrix and X∈RT×p is the data matrix defined as [X]t=xt.
Moving Average Processes
The Wold representation theorem (a.k.a. Wold decomposition): every covariance stationary time series can be written as the sum of two time series, one deterministic and one stochastic.
the deterministic component: AR(p) process.
the stochastic component: "moving average process" or MA(q) process
MA(q) Process
The q th order moving average process is the linear combination of the white noise process {ϵt−i}t=0q,∀t yt=μ+i=1∑qθiϵt−i+ϵt
an AR(1) process can be rewritten as a MA(∞) process.
Suppose that the AR(1) process has a mean μ and the variance of the noise is σn2, then by a binomial expansion of the operator (1−ϕL)−1 we have yt=1−ϕμ+j=0∑∞ϕjϵt−j
where the moments can be easily found and are E[yt]V[yt]=1−ϕμ=j=0∑∞ϕ2jE[ϵt−j2]=σn2j=0∑∞ϕ2j=1−ϕ2σn2.
GARCH
Generalized Autoregressive Conditional Heteroscedastic (GARCH) model
parametric
linear
heteroscedastic
GARCH(p,q) model specifies that the conditional variance (i.e., volatility) is given by an ARMA(p,q) model-there are p lagged conditional variances and q lagged squared noise terms:
A necessary condition for model stationarity is the following constraint: (i=1∑qαi+i=1∑pβi)<1
When the model is stationary, the long-run volatility converges to the unconditional variance of ϵt : σ2:=var(ϵt)=1−(∑i=1qαi+∑i=1pβi)α0
Exponential Smoothing
the setting
smoothing factor / smoothing coefficient: α∈(0,1)
smoothed predictions: y~t+1
forecast error: yt−y~t
the model y~t+1=y~t+α(yt−y~t),
or equivalently y~t+1=αyt+(1−α)y~t.
Fitting Time Series Models: The Box-Jenkins Approach
The three basic steps of the Box-Jenkins modeling approach:
(I)dentification: determining the order of the model (a.k.a. model selection);
(E)stimation: estimation of model parameters;
(D)iagnostic checking: evaluating the fit of the model.
Stationarity
Before the order of the model can be determined, the time series must be tested for stationarity.
Augmented Dickey-Fuller (ADF) test
A standard statistical test for covariance stationarity
accounts for the (c)onstant drift and (t)ime trend
Attempting to fit a time series model to non-stationary data will result in dubious interpretations of the estimated partial autocorrelation function and poor predictions and should therefore be avoided.
Transformation to Ensure Stationarity
Any trending time series process is non-stationary.
Detrending methods
differencing
Kalman filters
Markov-switching models
advanced neural networks
Identification
partial correlogram
Inforation Criterion
use the Akaike Information Criteria (AIC) to measure the quality of fit AIC=ln(σ^2)+T2k.
σ^2 is the residual variance
k=p+q+1 is the total number of parameters estimated.
The goal is to select the model which minimizes the AIC by first using maximum likelihood estimation and then adding the penalty term.
the AIC favors the best fit with the fewest number of parameters.
It is similar with regularization in machine learning where the loss function is penalized by a LASSO penalty (L1 norm of the parameters) or a ridge penalty ( L2 norm of the parameters).
AIC is estimated post-hoc, once the maximum likelihood function is evaluated, whereas in machine learning models, the penalized loss function is directly minimized.
Model Diagnostics
Once the model is fitted we must assess whether the residual exhibits autocorrelation, suggesting the model is underfitting.
The residual of fitted time series model should be white noise.
A short summary of some of the most useful diagnostic tests for time series modeling in finance
Name
Description
Chi-squared test
Used to determine whether the confusion matrix of a classifier is statistically significant, or merely white noise
t-test
Used to determine whether the output of two separate regression models are statistically different on i.i.d. data
Mariano-Diebold test
Used to determine whether the output of two separate time series models are statistically different
ARCH test
The ARCH Engle’s test is constructed based on the property that if the residuals are heteroscedastic, the squared residuals are autocorrelated. The Ljung–Box test is then applied to the squared residuals
Portmanteau test
A general test for whether the error in a time series model is auto-correlated Example tests include the Box-Ljung and the Box-Pierce test
Time Series Cross-Validation
Summary
We have covered the following objectives:
Explain and analyze linear autoregressive models;
Understand the classical approaches to identifying, fitting, and diagnosing autoregressive models;
Apply simple heteroscedastic regression techniques to time series data;
Understand how exponential smoothing can be used to predict and filter time series