L02 Linear Regression

Materials are adopted from "Murphy, Kevin P. Probabilistic machine learning: an introduction. MIT press, 2022.". This handout is only for teaching. DO NOT DISTRIBUTE.

Least squares linear regression


Least squares estimation


the normal equation (FOC)
the OLS solution
the solution is unique since tha Hessian is positive definite

Geometric interpretation of least squares

Weighted least squares

argminy^span({x:,1,,x:,d})yy^2.\underset{\hat{\boldsymbol{y}} \in \operatorname{span}\left(\left\{\boldsymbol{x}_{:, 1}, \ldots, \boldsymbol{x}_{:, d}\right\}\right)}{\operatorname{argmin}}\|\boldsymbol{y}-\hat{\boldsymbol{y}}\|_{2} .

Measuring goodness of fit

Ridge regression

Choosing the strength of the regularizer

Lasso regression

Why does l1l_1 regularization yield sparse solutions?

Hard vs soft thresholding

Consider the partial derivatives of the lasso objective

Regularization path

Plot the values w^d\hat{w}_d vs λ\lambda (or vs the bound BB) for each feature dd.

Group lasso

Elastic net (ridge and lasso combined)

L(w,λ1,λ2)=yXw2+λ2w22+λ1w1\mathcal{L}\left(\boldsymbol{w}, \lambda_{1}, \lambda_{2}\right)=\|\boldsymbol{y}-\mathbf{X} \boldsymbol{w}\|^{2}+\lambda_{2}\|\boldsymbol{w}\|_{2}^{2}+\lambda_{1}\|\boldsymbol{w}\|_{1}