|
|
|
|
the normal equation (FOC)
the OLS solution
the solution is unique since tha Hessian is positive definite
|
![]() |
where
|
![]() |
Algorithms | Lagrangian | Constrained quadratic program |
lasso |
|
|
ridge |
|
|
Consider the partial derivatives of the lasso objective
Plot the values
|
|
![]() |
|
|
|
year | age | maritl | race | education | region | jobclass | health | health_ins | logwage | wage |
---|---|---|---|---|---|---|---|---|---|---|
2006 | 18 | 1. Never Married | 1. White | 1.0 | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 2. No | 4.318063 | 75.043154 |
2004 | 24 | 1. Never Married | 1. White | 4.0 | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 2. No | 4.255273 | 70.476020 |
2003 | 45 | 2. Married | 1. White | 3.0 | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 1. Yes | 4.875061 | 130.982177 |
2003 | 43 | 2. Married | 3. Asian | 4.0 | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 1. Yes | 5.041393 | 154.685293 |
2005 | 50 | 4. Divorced | 1. White | 2.0 | 2. Middle Atlantic | 2. Information | 1. <=Good | 1. Yes | 4.318063 | 75.043154 |
|
![]() |
|
![]() |
|
![]() |
fitting separate low-degree polynomials over different regions of
example: piecewise cubic polynomial with a single knot at a point
degree of freedom
Using more knots leads to a more flexible piecewise polynomial
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
computing the fit at a target point |
![]() |
Algorithm: Local Regression At |
---|
1. Gather the fraction |
2. Assign a weight |
3. Fit a weighted least squares regression of the |
4. The fitted value at |
GAMs automatically model non-linear relationships that standard linear regression will miss.
The non-linear fits can potentially make more accurate predictions for the response
We can examine the effect of each
The smoothness of the function
# Use patsy to generate entire matrix of basis functions
X = pt.dmatrix('cr(year, df=4)+cr(age, df=5) + education', wage_df)
y = np.asarray(wage_df['wage'])
# Fit logistic regression model
model = sm.OLS(y, X).fit(disp=0)
y_hat = model.predict(X)
conf_int = confidence_interval(X, y, y_hat)
|
![]() |
|
![]() |
|
![]() |
# Model 1
X = pt.dmatrix('cr(age, df=5) + education', wage_df)
y = np.asarray(wage_df['wage'])
model1 = sm.OLS(y, X).fit(disp=0)
# Model 2
X = pt.dmatrix('year+cr(age, df=5) + education', wage_df)
y = np.asarray(wage_df['wage'])
model2 = sm.OLS(y, X).fit(disp=0)
# Model 3
X = pt.dmatrix('cr(year, df=4)+cr(age, df=5) + education', wage_df)
y = np.asarray(wage_df['wage'])
model3 = sm.OLS(y, X).fit(disp=0)
# Compare models with ANOVA
display(sm.stats.anova_lm(model1, model2, model3))
df_resid | ssr | df_diff | ss_diff | F Pr(>F) | |
---|---|---|---|---|---|
0 | 2994.0 | 3.750437e+06 | 0.0 | NaN | NaN |
1 | 2993.0 | 3.732809e+06 | 1.0 | 17627.473318 | 14.129318 |
2 | 2991.0 | 3.731516e+06 | 2.0 | 1293.696286 | 0.518482 |
|
![]() |
|
![]() |
Gaussian noise assumption
Robust regression: replace the Gaussian distribution for the response
variable with a distribution that has heavy tails
Likelihood | Prior | Posterior | Name |
---|---|---|---|
Gaussian | Uniform | Point | Least squares |
Student | Uniform | Point | Robust regression |
Laplace | Uniform | Point | Robust regression |
Gaussian | Gaussian | Point | Ridge |
Gaussian | Laplace | Point | Lasso |
Gaussian | Gauss-Gamma | Gauss-Gamma | Bayesian linear regression |
It is equivalent to
Huber loss function is everywhere differentiable.
optimizing the Huber loss is much faster than using the Laplace likelihood