Created

Nov 16, 2021 07:10 PM

Topics

Dimensionally Reduction, PCA, Matrix Factorization

Unsupervised LearningTwo Learning ParadigmsDimensionality ReductionPurposeAssumptionExamplesCurse of dimensionalityWhy curse? When trying to fit data in 2DMethodsFeature ExtractionTypesLinear Feature TransformNon-Linear Feature TransformCriteriaPrincipal Component Analysis (PCA)IntuitionComputationDerivationEigenvalue magnitude % of varianceFinding one principle componentFull PCA algorithmHyper-parameter EstimationMethod: Singular Value Decomposition (SVD)Method: Eigen Decomposition (Diagonalization)Example Use CasesNon-negative Matrix Factorization (NMF)PurposeWhy no subtraction?Linear Discriminant Analysis (LDA)Loss FunctionGoalExampleFischer Discriminant AnalysisTermsLoss Function

## Unsupervised Learning

Only inputs, no labels

#### Two **Learning Paradigms**

**Dimensionality reduction**: feature selection- PCA: linear
- Non-negative Matrix Factorization: linear

**Clustering**: pattern detection

## Dimensionality Reduction

### Purpose

**Computational**: less space & time complexity

**Statistical**: better generalizability, less-prone to overfitting

**Visualization**: better human understandability

**Anomaly detection**: can detect outliers

### Assumption

Data inherently lies in the low dimensional space

#### Examples

Source: 2; Intrinsic: 1

Source: 3; Intrinsic: 2

Hand-written Digits

### Curse of dimensionality

Given a -dimensional sphere with volume

Volume of the outermost shell

As we increase dimension, most volume of the sphere in composed of the shell:

**Why curse? When trying to fit data in 2D**

- Granularity: maintain the # of bins in each axis

**Keep density of bins constant**: each bin has same number of samples

- Size of training set explodes: need more data

**Keep the number of examples constant**: same # as 1D

- Some bins becomes empty

Performance of classifiers degrade when dimensionalityβ

- Need
**exponentially**more data to fit a model in higher dimensions

### Methods

- Feature Selection

- Feature Extraction

## Feature Extraction

### Types

#### Linear Feature Transform

- Principal Component Analysis (PCA)

- Non-Negative Matrix Factorization (NMF)

- Linear Discriminant Analysis (LDA)

- Local Linear Embedding (LLE)

- ISOMAP, etc.

#### Non-Linear Feature Transform

- Kernal PCA

- Auto-encoders, etc.

### Criteria

Signal Representation

Classification Accuracy

## Principal Component Analysis (PCA)

### Intuition

Find a projection that can reconstruct the original input

- Project (Encode):

- Reconstruct (Decode):

- Minimize the reconstruction error

**Equivalent to**: Find a dimension with the highest variance

### Computation

Β

### Derivation

#### Eigenvalue magnitude % of variance

#### Finding one principle component

### Full PCA algorithm

- Recenter data: subtract mean from each column

- Compute Covariance Matrix

- Find Eigenvectors & Eigenvalues of

- The Eigenvectors w/ highest Eigenvalues = principal components

β οΈ Note: Computationally Inefficient

#### Hyper-parameter Estimation

By visual inspection:

- Eigenvalues drop after some value

- Ignore the dimensions w/ small Eigenvalues

#### Method: Singular Value Decomposition (SVD)

A practical way of performing PCA

- Compute SVD of input data matrix :

- Left singular vectors principal components of

Time Complexity:

#### Method: Eigen Decomposition (**Diagonalization)**

- Compute eigen decomposition of covariance matrix

### Example Use Cases

## Eigen Faces

## Latent semantic analysis

## Non-negative Matrix Factorization (NMF)

PCA w/ a

**non-negative**constraint on- Projection matrix

- Dim-reduced input

### Purpose

PCA: β adding / subtracting elements

**NMF**: is created only by summing elements of the dictionary

β Reduced input becomes more interpretable

#### Why no subtraction?

E.g. subtracting noses from lips does not make any sense

This assumption is particular to input data type

- NMF is NOT universally better than PCA

### Linear Discriminant Analysis (LDA)

**Dimensionality reduction w/ labels**Find a direction of projection that maximizes separation

: mean of

**input**samples in class (dimensional ): mean of

**projected**samples in class (dimensional )#### Loss Function

#### Goal

Maximize distance between projected means

#### Example

Bad: small gap

Good Direction: big gap

βIn linear algebra terms:

`argmax`

the direction of eigenvector ### Fischer Discriminant Analysis

Increase distance between class means
+ Decease variance within classes

= LDA normalized by within-class scatter

#### Terms

- Scatter of a class:

- Within-class scatter : variance of points in a class

#### Loss Function

Β