L05 Unsupervised Learning

We will...

  • Algorithms
    • Dimensionality Reduction: PCA
    • Clustering: K-Means
  • Coding with python
  • Financial applications

Challenges

  • Unsupervised learning: a set of features measured on observations (without target )

  • Goal: To discover interesting things about the measurements on

    • Is there an informative way to visualize the data?
    • Can we discover subgroups among the variables or among the observations?
  • The Challenge of Unsupervised Learning

    • exercise tends to be more subjective
    • there is no simple goal for the analysis

Principal Components Analysis (PCA)

How to visualize high dimensional data? (The Iris Classification Example)


  • tabular data with a small number of features: pair plot
  • higher-dimensional data: dimension reduction first and then to visualize the data in 2d or 3d
  • The big idea of PCA: fnd a low-dimensional representation of a data set that contains as much as possible of the variation

Examples

  • a simple example project from 2d to 1d
  • hand writing digit recognition (28*28d to 2d)
  • human face recognition (64*64d to 3d)

A Detailed Example

  • A set of (p-dimensional) features

  • The first principal component

    • is the normalized linear combination of the features that has the largest variance.
    • the loadings of the first principal component:
    • the principal component loading vector,
  • for a specific point :




  • the most imformative direction: :

  • the second principal component

    • maximal variance out of all linear combinations that are uncorrelated with

Another Interpretation of Principal Components

Principal components provide low-dimensional linear surfaces that are closest to the observations.

  • the best -dimensional approximation (in terms of Euclidean distance) to the th observation

  • the optimization problem

  • the smallest possible value of the objective in (12.6) is

  • Principal component loading vectors can give a good approximation to the data when is sufficiently large

The Proportion of Variance Explained (PVE)

  • The total variance present in a data set is defined as

  • the variance explained by the th principal component is

  • the PVE of the th principal component

  • the variance of the data can be decomposed into the variance of the first principal components plus the mean squared error of this -dimensional approximation, as follows:

  • we can interpret the PVE as the of the approximation for given by the first principal components.

Clustering Methods

K-Means Clustering

  • Partitioning a data set into distinct, non-overlapping clusters.
  • Let denote sets containing the indices of the observations in each cluster. These sets satisfy two properties:
      1. . In other words, each observation belongs to at least one of the clusters.
      1. for all . In other words, the clusters are nonoverlapping: no observation belongs to more than one cluster.
  • the big idea
    • within-cluster variation is as small as possible

    • within-cluster variation

Financial applications