Page cover image

#12: Dimensionally Reduction

    Created
    Nov 16, 2021 07:10 PM
    Topics
    Dimensionally Reduction, PCA, Matrix Factorization

    Unsupervised Learning

    Only inputs, no labels

    Two Learning Paradigms

    1. Dimensionality reduction: feature selection
        • PCA: linear
        • Non-negative Matrix Factorization: linear
    1. Clustering: pattern detection

    Dimensionality Reduction

    Purpose

    Computational: less space & time complexity
    Statistical: better generalizability, less-prone to overfitting
    Visualization: better human understandability
    Anomaly detection: can detect outliers

    Assumption

    Data inherently lies in the low dimensional space

    Examples

    Source: 2; Intrinsic: 1
    Source: 3; Intrinsic: 2
    notion image
    Hand-written Digits

    Curse of dimensionality

    Given a -dimensional sphere with volume
    Volume of the outermost shell
    As we increase dimension, most volume of the sphere in composed of the shell:

    Why curse? When trying to fit data in 2D

    • Granularity: maintain the # of bins in each axis
    Keep density of bins constant: each bin has same number of samples
    • Size of training set explodes: need more data
    Keep the number of examples constant: same # as 1D
    • Some bins becomes empty
    πŸ’‘
    Performance of classifiers degrade when dimensionality↑
    notion image
    • Need exponentially more data to fit a model in higher dimensions

    Methods

    • Feature Selection
    • Feature Extraction

    Feature Extraction

    notion image

    Types

    Linear Feature Transform

    • Principal Component Analysis (PCA)
    • Non-Negative Matrix Factorization (NMF)
    • Linear Discriminant Analysis (LDA)
    • Local Linear Embedding (LLE)
    • ISOMAP, etc.

    Non-Linear Feature Transform

    • Kernal PCA
    • Auto-encoders, etc.

    Criteria

    Signal Representation
    Classification Accuracy
    notion image

    Principal Component Analysis (PCA)

    Intuition

    Find a projection that can reconstruct the original input
    • Project (Encode):
    • Reconstruct (Decode):
    • Minimize the reconstruction error
      • notion image
    Equivalent to: Find a dimension with the highest variance
    notion image

    Computation

    Β 

    Derivation

    Eigenvalue magnitude % of variance

    notion image

    Finding one principle component

    notion image

    Full PCA algorithm

    1. Recenter data: subtract mean from each column
    1. Compute Covariance Matrix
    1. Find Eigenvectors & Eigenvalues of
    1. The Eigenvectors w/ highest Eigenvalues = principal components
    ⚠️ Note: Computationally Inefficient

    Hyper-parameter Estimation

    By visual inspection:
    • Eigenvalues drop after some value
    • Ignore the dimensions w/ small Eigenvalues

    Method: Singular Value Decomposition (SVD)

    βœ…
    A practical way of performing PCA
    1. Compute SVD of input data matrix :
    1. Left singular vectors principal components of
    Time Complexity:

    Method: Eigen Decomposition (Diagonalization)

    1. Compute eigen decomposition of covariance matrix

    Example Use Cases

    Eigen Faces
    notion image
    Latent semantic analysis

    Non-negative Matrix Factorization (NMF)

    πŸ’‘
    PCA w/ a non-negative constraint on
    • Projection matrix
    • Dim-reduced input

    Purpose

    PCA: β‡’ adding / subtracting elements
    NMF: is created only by summing elements of the dictionary
    β‡’ Reduced input becomes more interpretable

    Why no subtraction?

    E.g. subtracting noses from lips does not make any sense
    ⚠️
    This assumption is particular to input data type
    • NMF is NOT universally better than PCA

    Linear Discriminant Analysis (LDA)

    πŸ’‘
    Dimensionality reduction w/ labels Find a direction of projection that maximizes separation
    : mean of input samples in class (dimensional )
    : mean of projected samples in class (dimensional )

    Loss Function

    Goal

    Maximize distance between projected means

    Example

    Bad: small gap
    Good Direction: big gap
    notion image
    ❓In linear algebra terms: argmax the direction of eigenvector

    Fischer Discriminant Analysis

    πŸ’‘
    Increase distance between class means + Decease variance within classes
    = LDA normalized by within-class scatter

    Terms

    • Scatter of a class:
    • Within-class scatter : variance of points in a class

    Loss Function

    Β