applications
- Linear regression with categorical inputs: If the ’th variable is categorical with possible levels, then it will be represented as a one-hot vector of length , so to exclude variable , we have to set the whole vector of incoming weights to .
- Multinomial logistic regression: The ’th variable will be associated with different weights, one per class, so to exclude variable , we have to set the whole vector of outgoing weights to .
- Neural networks: the ’th neuron will have multiple inputs, so if we want to “turn the neuron off”, we have to set all the incoming weights to zero. This allows us to use group sparsity to learn neural network structure.
- Multi-task learning: each input feature is associated with different weights, one per output task. If we want to use a feature for all of the tasks or none of the tasks, we should select weights at the group level.

Algorithms	Lagrangian	Constrained quadratic program
lasso
ridge

Likelihood	Prior	Posterior	Name
Gaussian	Uniform	Point	Least squares
Student	Uniform	Point	Robust regression
Laplace	Uniform	Point	Robust regression
Gaussian	Gaussian	Point	Ridge
Gaussian	Laplace	Point	Lasso
Gaussian	Gauss-Gamma	Gauss-Gamma	Bayesian linear regression

	simple linear regression multiple linear regression multivariate linear regression
	if can not be well fitted by linear function of apply nonlinear transformation (feature extractor) to as long as the parameters for are fixed, the model remains linear in parameters

Algorithm: Local Regression At
1. Gather the fraction of training points whose are closest to .
2. Assign a weight to each point in this neighborhood, so that the point furthest from has weight zero, and the closest has the highest weight. All but these nearest neighbors get weight zero.
3. Fit a weighted least squares regression of the on the using the aforementioned weights, by finding and that minimize
4. The fitted value at is given by .

L02 Regressions

We will