If recently we used best subset as a way of reducing the unnecessary model complexity, this time we are going to use the Ridge regression technique.
Both the lasso and ridge regression are called shrinkage methods. The best subset method uses least squares to fit a model with a subset of predictors. Alternatively, shrinkage methods use all predictors but constraining and regularising them towards zero. One major difference between them, is that ridge will end up using all the predictors, while the lasso shrink some of them up to the point of making them zero.
Again we will use the classic swiss
data set provided with R datasets.
And again we are interested in predicting infant mortality of an hypotetical commune using a multi-linear model. In the previous post we could see a quick exploratory analysis of the correlation between the different variables.
The glmnet
package provides methods to perform ridge regression and the
lasso. The main function in the package is glmnet()
. This function has
a different syntax from other model-fitting functions in R. This time we must
pass in an x
matrix as well as a y
vector, and we do not use the familiar
y ∼ x
syntax.
A quick look at the first rows of the matrix shows that basically contains values for the 5 predictors in each of the comunes.
The glmnet()
function takes an alpha
argument that determines what method is
used. If alpha=0
then ridge regression is used, while if alpha=1
then the
lasso is used. We will start with the former.
By default the glmnet
function performs ridge regression for an automatically
selected range of λ values (the shrinkage coefficient). The values are based on
nlambda
and lambda.min.ratio
. Associated with each value of λ is a vector
of regression coefficients. For example, the 100th value of λ, a very small
one, is closer to perform least squares:
While the 1st one is the null model containing just the intercept, due to the shrinkage of all the predictor coefficients:
But it would be better to use cross-validation to choose λ. We can do this using
cv.glmnet
. By default, the function performs ten-fold cross-validation:
Once we have the best lambda, we can use predict
to obtain the coefficients.
Next time, the lasso.