In many forecasting exercises it is useful to include several lags of a variable as potential predictor. I have not been able to find a nice solution online to create a data set that includes several lags of each variable, so I want to share my solution.
The goal is to create h=12 lags of each variable in a data set. So if you have three variables in a matrix, say [x1, x2, x3], you want a new matrix of the form [x1_l0, x1_l1, x1_l2, ... , x1_lh, x2_l0, x2_l1, x2_l2, ... , x2_lh, x3_l0, x3_l1, x3_l2, ... , x3_lh, ] , where h is the maximum lag length.
To do this, we'll need the zoo package that allows to create multiple lags of a series and fills up the created matrix of lags with missing values. The forecast package is used to create some example data of time series.
First, we will create a sample data set of time series using the arima.sim function of the forecast package:
Now, we will create the lagged data set.
The code avoids loops by using the sapply function, thus creating a list with the variables and its lags as elements. The list is then concentrated back into a matrix using the do.call function. The next step is to assign each predictor the variable name combined with its respective lag. This is done by using the the same trick as before, concentrating a list that includes as its elements a combination of the variable name and the respective lag.
No comments:
Post a Comment