| 
  | 
A set of (p-dimensional) features 
The first principal component 

Principal components provide low-dimensional linear surfaces that are closest to the observations.
the best 
the optimization problem
the smallest possible value of the objective in (12.6) is
The total variance present in a data set is defined as
the variance explained by the 
the PVE of the 
the variance of the data can be decomposed into the variance of the first 
we can interpret the PVE as the 
 | 
 
 | 
 
 | 
 
 |