year	age	maritl	race	education	region	jobclass	health	health_ins	logwage	wage
2006	18	1. Never Married	1. White	1.0	2. Middle Atlantic	1. Industrial	1. <=Good	2. No	4.318063	75.043154
2004	24	1. Never Married	1. White	4.0	2. Middle Atlantic	2. Information	2. >=Very Good	2. No	4.255273	70.476020
2003	45	2. Married	1. White	3.0	2. Middle Atlantic	1. Industrial	1. <=Good	1. Yes	4.875061	130.982177
2003	43	2. Married	3. Asian	4.0	2. Middle Atlantic	2. Information	2. >=Very Good	1. Yes	5.041393	154.685293
2005	50	4. Divorced	1. White	2.0	2. Middle Atlantic	2. Information	1. <=Good	1. Yes	4.318063	75.043154

Coding: Regression spline

# Putting confidence interval calcs into function for convenience.
def confidence_interval(X, y, y_hat):
    """Compute 5% confidence interval for linear regression"""
    # STATS
    # ----------------------------------    
    # Covariance of coefficient estimates
    mse = np.sum(np.square(y_hat - y)) / y.size
    cov = mse * np.linalg.inv(X.T @ X)
    # ...or alternatively this stat is provided by stats models:
    #cov = model.cov_params()
    
    # Calculate variance of f(x)
    var_f = np.diagonal((X @ cov) @ X.T)
    # Derive standard error of f(x) from variance
    se       = np.sqrt(var_f)
    conf_int = 2*se
    return conf_int

# Fit spline with 6 degrees of freedom
# Use patsy to generate entire matrix of basis functions
X = pt.dmatrix('bs(age, df=7, degree=3, include_intercept=True)', wage_df)
y = np.asarray(wage_df['wage'])

# Fit logistic regression model
model = sm.OLS(y, X).fit(disp=0)
y_hat = model.predict(X)
conf_int = confidence_interval(X, y, y_hat)

# PLOT
# ----------------------------------
# Setup axes
fig, ax = plt.subplots(figsize=(10,10))

# Plot datapoints
sns.scatterplot(x='age', y='wage', color='tab:gray', alpha=0.2, ax=ax,
                data=pd.concat([wage_df['age'], wage_df['wage']], axis=1));

# Plot estimated f(x)
sns.lineplot(x=wage_df['age'], y=y_hat, ax=ax, color='blue');

# Plot confidence intervals
sns.lineplot(x=wage_df['age'], y=y_hat+conf_int, color='blue');
sns.lineplot(x=wage_df['age'], y=y_hat-conf_int, color='blue');
# dash confidnece int
ax.lines[1].set_linestyle("--")
ax.lines[2].set_linestyle("--")

Algorithms	Lagrangian	Constrained quadratic program
lasso
ridge

	df_resid	ssr	df_diff	ss_diff	F Pr(>F)
0	2994.0	3.750437e+06	0.0	NaN	NaN
1	2993.0	3.732809e+06	1.0	17627.473318	14.129318
2	2991.0	3.731516e+06	2.0	1293.696286	0.518482

Likelihood	Prior	Posterior	Name
Gaussian	Uniform	Point	Least squares
Student	Uniform	Point	Robust regression
Laplace	Uniform	Point	Robust regression
Gaussian	Gaussian	Point	Ridge
Gaussian	Laplace	Point	Lasso
Gaussian	Gauss-Gamma	Gauss-Gamma	Bayesian linear regression

	simple linear regression multiple linear regression multivariate linear regression
	if can not be well fitted by linear function of apply nonlinear transformation (feature extractor) to as long as the parameters for are fixed, the model remains linear in parameters

`# Plot estimated f(year) sns.lineplot(x=wage_df['year'], y=y_hat)`
`# Plot estimated f(age) sns.lineplot(x=wage_df['age'], y=y_hat);`
`# Plot estimated f(age) sns.lineplot(x=wage_df['age'], y=y_hat);`

Algorithm: Local Regression At
1. Gather the fraction of training points whose are closest to .
2. Assign a weight to each point in this neighborhood, so that the point furthest from has weight zero, and the closest has the highest weight. All but these nearest neighbors get weight zero.
3. Fit a weighted least squares regression of the on the using the aforementioned weights, by finding and that minimize
4. The fitted value at is given by .

L02 Regressions

We will learn

Least squares linear regression

Terminology

Least squares estimation

OLS

Statistical Properties of OLS (finite sample)

Geometric interpretation of least squares

Weighted least squares

Measuring goodness of fit

Penalized (linear) regressions

Ridge regression

Choosing the strength of the regularizer

Lasso regression

Why does regularization yield sparse solutions?

Hard vs soft thresholding

Regularization path

Group lasso

Coding: Lasso model selection via cross-validation

Elastic net (ridge and lasso combined)

Nonelinear regression

Polynomial regression

Coding: Polynomial regression

Step functions

Coding: Step function

Basis functions

Regression splines

Piecewise Polynomials

Constraints and Splines

The Spline Basis Representation

Choosing the Number and Locations of the Knots

Comparison to Polynomial Regression

Smoothing splines

An Overview of Smoothing Splines

Coding: Regression spline

Coding: Natural spline

Local regression

Algorithm

Local linear regression

Generalized additive models

GAMs for Regression Problems

An Example (using natural spline)

An Example (using smoothing spline)

Pros and Cons of GAMs

Coding: GAM

Robust linear regression

Laplace likelihood

Student- likelihood

Huber loss