Created

Sep 21, 2021 05:49 PM

Topics

ValidationFormulaChoosing Validation Set Size Fold-Back-In Validation Procedure: Make Most Use of DataModel SelectionCross ValidationLeave One Out AnalysisPros & ConsModel Selection with Cross ValidationBayesian Decision TheoryBayes RuleGivenTo Find: Posterior ProbabilityErrorConditional RiskBayes TheoremEvidenceConditional RiskBayes Decision RuleFrequentist vs Bayesian ApproachBayesian Parameter EstimationExample: Logistic RegressionHow to Choose a Prior?? LearningNaive Bayes ClassifierDiscriminative vs Generative Model

## Validation

For any hypothesis

- Overfitting Penalty: estimated by regularization

- : estimated by
**validation**

### Formula

Dataset

- Training set samples

- Validation set samples

the hypothesis selected after training on

**Expected value of validation error **

This is because

Thus we have

- Expected value of validation error closely matches

#### Choosing Validation Set Size

⬆️ ⇒ ⬇️

too large ⇒ training set size too small

**Practical Rule**: use 20% of as

### Fold-Back-In Validation

#### Procedure: Make Most Use of Data

- Separate the training & validation set

- Tune model to find best hyper-parameters

- Put validation set back to train the last time

### Model Selection

Use the same validation set multiple times

**without**loosing guaranteesAssume you have models (hypothesis sets):

### Cross Validation

**Small K:**

**Large K:**

#### Leave One Out Analysis

- Leave 1 sample out, use the rest to train the model

- Compute model error with the 1 sample

- Repeat step 1 & 2 for times

- Compute the
**Cross Validation Error**

#### Pros & Cons

**Pro**: Far more accurate

**Con**: Very expansive

- The Model is trained times

### Model Selection with Cross Validation

- Define models by choosing different values of :

- For each model :
- Run the cross validation module to get an estimate of the cross validation error
- Pick the model with the smallest error

- Train the model on the entire training set to obtain the final hypothesis

## Bayesian Decision Theory

### Bayes Rule

#### Given

- State of nature :

- Prior probabilities

- Class conditional probability: ,

#### To Find: Posterior Probability

### Error

if we decide on

if we decide on

Evidence is a scaling factor ⇒ Pick if otherwise

### Conditional Risk

multi-dimensional observations represented as a vector

#### Bayes Theorem

#### Evidence

#### Conditional Risk

Expected loss associated with taking an action when the true state of nature is

### Bayes Decision Rule

To minimize the overall risk, compute & take action to minimize conditional risk for

## Frequentist vs Bayesian Approach

## Bayesian Parameter Estimation

- Start with prior distribution

= prior knowledge about parameters before looking at the data

- Given a data set

- Use the Bayesian Theorem to find the posterior distribution of given
- Denominator = dist of data

### Example: Logistic Regression

⬆️ The product term

Maximum A Posteriori Estimation

Bayesian Linear Regression

Generative vs Discriminative Models

## How to Choose a Prior

Objective

Subjective

Conjugate

## ?? Learning

Prior = posterior of the previous iteration

## Naive Bayes Classifier

Assumes all

**features**are conditionally**independent**of other features⇒ only depends on y

⇒

Reduce params to

Generative Model