|
|
|
|
|
|
the normal equation (FOC)
the OLS solution
the solution is unique since tha Hessian is positive definite
|
|
where
|
|
| Algorithms | Lagrangian | Constrained quadratic program |
| lasso |
|
|
| ridge |
|
|
Consider the partial derivatives of the lasso objective
Plot the values
fitting separate low-degree polynomials over different regions of
example: piecewise cubic polynomial with a single knot at a point
degree of freedom
Using more knots leads to a more flexible piecewise polynomial
|
|
|
|
|
|
|
computing the fit at a target point |
|
| Algorithm: Local Regression At |
|---|
| 1. Gather the fraction |
| 2. Assign a weight |
| 3. Fit a weighted least squares regression of the |
| 4. The fitted value at |
GAMs automatically model non-linear relationships that standard linear regression will miss.
The non-linear fits can potentially make more accurate predictions for the response
We can examine the effect of each
The smoothness of the function
Gaussian noise assumption
Robust regression: replace the Gaussian distribution for the response
variable with a distribution that has heavy tails
| Likelihood | Prior | Posterior | Name |
|---|---|---|---|
| Gaussian | Uniform | Point | Least squares |
| Student | Uniform | Point | Robust regression |
| Laplace | Uniform | Point | Robust regression |
| Gaussian | Gaussian | Point | Ridge |
| Gaussian | Laplace | Point | Lasso |
| Gaussian | Gauss-Gamma | Gauss-Gamma | Bayesian linear regression |
It is equivalent to
Huber loss function is everywhere differentiable.
optimizing the Huber loss is much faster than using the Laplace likelihood