Created

Sep 9, 2021 06:36 PM

Topics

ย

Linear RegressionModel CompositionLoss Function (Error Metric)Residual Sum of SquaresTrainingMaking PredictionModel AnalysisSignificance of a Single ParameterSignificance of a Group of CoefficiantsRegularization (Shrinkage Methods) Penalty UsageRidge RegressionUsageModelLoss FunctionClosed Form SolutionLASSO (Least Absolute Shrinkage and Selection Operator) RegressionUsageModelBias-Variance AnalysisPurposeFormulaExample

## Linear Regression

### Model Composition

**-dimensional Input Vector**

**Learnable Parameter Vector**

**Target Function**

**: assumed linear**

**Hypothesis Function**

**:**any linear function of parameters

### Loss Function (Error Metric)

Measures the discrepancy between the model prediction and the actual value on the training set

**Residual Sum of Squares**

### Training

Adjust to minimize loss

- Matrix Representation of the Hypothesis

Subsume intercept/bias into the parameter vector : Augment input vector by 1:

- Compute the derivatives of and set it to zero

- Solve for the
**closed form solution**

### Making Prediction

For a new observation , its prediction can be computed as

### Model Analysis

#### Significance of a Single Parameter

= variance of the observations assumed uncorrelated with constant variance

For any variable define the Z-score as

- is the diagonal element of

**Result**: smaller โ less important

#### Significance of a Group of Coefficiants

**F-statistics:**measures the change in residual

- = residual sum of squares of the
**bigger model with****parameters**

- = residual sum of squares for the
**nested smaller model**

## Regularization (Shrinkage Methods)

ๆบๅจๅญฆไน = ่ฎญ็ป้ไธ็ๆๅฐๅ้ฎ้ข โ ๅฎนๆ่ฟๆๅ

ไธบๅฏนๆ่ฟๆๅ๏ผๅๆๅคฑๅฝๆฐไธญๅ ๅ
ฅๆ่ฟฐๆจกๅๅคๆ็จๅบฆ็ๆญฃๅ้กน

Regularized Loss Function = Loss Function + Penalty Term

#### Penalty

- = Regularization parameter

- = weight associated with the variables
- generally considered to be the -norms

**Usage**

- Help adjust complexity of hypothesis space

- Balance fitness and generalizability

### Ridge Regression

Implementation of

Regularization- Penalty = squared magnitude (-norm) of coefficients

#### Usage

- Shrink the coefficients but greater than 0

- Confine hypothesis space โ make it smaller than the space of all linear functions

- Output non-sparse

#### Model

- Minimize

- Subject to constraint

Rewritten as

`Lagrangian Multiplier`

#### Loss Function

#### Closed Form Solution

### LASSO (Least Absolute Shrinkage and Selection Operator) Regression

Implementation of Regularization

- Penalty = absolute value (-norm) of coefficients

#### Usage

- Penalize insignificant coefficients to zero

โ feature-selection method to remove useless coefficients

- Prefers
**sparsity:**less terms โ better - With a lot of parameters, only some of them have predictive power

- Output is sparse: some coefficients are left out

#### Model

- Shrinkage Factor

## Bias-Variance Analysis

### Purpose

express in a way that helps choosing the hypothesis space

- How well can approximate

- How well we can zoom in on a good hypothesis

- A theoretical process: is not accessible in practice

### Formula

Given:

- Dependence of on

- expectation over based on the distribution on

- Average hypothesis over multiple draws of

ย

### Example

**Given:**

๐ซ : High bias | Low Variance

โ
: Low bias | High Variance

**Which one is better?**

ย

Match the

**model complexity**to the**data resources**NOT the**target complexity โ****better generalizability****Stick to the simplest answer!! (Occam's Razor)**ย

## Multi-Class Classification

Use Logistic Regression models to classify classes

#### Likelihood for a Single Sample (`Softmax`

Function)

The

`softmax`

function is a function that **turns a vector of****real values**into**a vector of****real values that sum to 1**.#### Likelihood of the Data

#### Loss Function (Cross-Entropy Loss)

- value of output to the j-th