Page cover image

#12: Dimensionally Reduction

    Nov 16, 2021 07:10 PM
    Dimensionally Reduction, PCA, Matrix Factorization

    Unsupervised Learning

    Only inputs, no labels

    Two Learning Paradigms

    1. Dimensionality reduction: feature selection
        • PCA: linear
        • Non-negative Matrix Factorization: linear
    1. Clustering: pattern detection

    Dimensionality Reduction


    Computational: less space & time complexity
    Statistical: better generalizability, less-prone to overfitting
    Visualization: better human understandability
    Anomaly detection: can detect outliers


    Data inherently lies in the low dimensional space


    Source: 2; Intrinsic: 1
    Source: 3; Intrinsic: 2
    notion image
    Hand-written Digits

    Curse of dimensionality

    Given a -dimensional sphere with volume
    Volume of the outermost shell
    As we increase dimension, most volume of the sphere in composed of the shell:

    Why curse? When trying to fit data in 2D

    • Granularity: maintain the # of bins in each axis
    Keep density of bins constant: each bin has same number of samples
    • Size of training set explodes: need more data
    Keep the number of examples constant: same # as 1D
    • Some bins becomes empty
    Performance of classifiers degrade when dimensionality↑
    notion image
    • Need exponentially more data to fit a model in higher dimensions


    • Feature Selection
    • Feature Extraction

    Feature Extraction

    notion image


    Linear Feature Transform

    • Principal Component Analysis (PCA)
    • Non-Negative Matrix Factorization (NMF)
    • Linear Discriminant Analysis (LDA)
    • Local Linear Embedding (LLE)
    • ISOMAP, etc.

    Non-Linear Feature Transform

    • Kernal PCA
    • Auto-encoders, etc.


    Signal Representation
    Classification Accuracy
    notion image

    Principal Component Analysis (PCA)


    Find a projection that can reconstruct the original input
    • Project (Encode):
    • Reconstruct (Decode):
    • Minimize the reconstruction error
      • notion image
    Equivalent to: Find a dimension with the highest variance
    notion image




    Eigenvalue magnitude % of variance

    notion image

    Finding one principle component

    notion image

    Full PCA algorithm

    1. Recenter data: subtract mean from each column
    1. Compute Covariance Matrix
    1. Find Eigenvectors & Eigenvalues of
    1. The Eigenvectors w/ highest Eigenvalues = principal components
    ⚠️ Note: Computationally Inefficient

    Hyper-parameter Estimation

    By visual inspection:
    • Eigenvalues drop after some value
    • Ignore the dimensions w/ small Eigenvalues

    Method: Singular Value Decomposition (SVD)

    A practical way of performing PCA
    1. Compute SVD of input data matrix :
    1. Left singular vectors principal components of
    Time Complexity:

    Method: Eigen Decomposition (Diagonalization)

    1. Compute eigen decomposition of covariance matrix

    Example Use Cases

    Eigen Faces
    notion image
    Latent semantic analysis

    Non-negative Matrix Factorization (NMF)

    PCA w/ a non-negative constraint on
    • Projection matrix
    • Dim-reduced input


    PCA: β‡’ adding / subtracting elements
    NMF: is created only by summing elements of the dictionary
    β‡’ Reduced input becomes more interpretable

    Why no subtraction?

    E.g. subtracting noses from lips does not make any sense
    This assumption is particular to input data type
    • NMF is NOT universally better than PCA

    Linear Discriminant Analysis (LDA)

    Dimensionality reduction w/ labels Find a direction of projection that maximizes separation
    : mean of input samples in class (dimensional )
    : mean of projected samples in class (dimensional )

    Loss Function


    Maximize distance between projected means


    Bad: small gap
    Good Direction: big gap
    notion image
    ❓In linear algebra terms: argmax the direction of eigenvector

    Fischer Discriminant Analysis

    Increase distance between class means + Decease variance within classes
    = LDA normalized by within-class scatter


    • Scatter of a class:
    • Within-class scatter : variance of points in a class

    Loss Function