Created

Nov 9, 2021 07:18 PM

Topics

Bagging & Boosting

Bagging (Bootstrap Aggregating)Bootstrap SamplingClassification Tree ExampleSetup & TrainingMaking PredictionsOut-of-Bag (OOB) Error EstimationRandom ForestsRandom Feature SelectionSummaryAnalysisBoostingPredictionIntuitionAdaboostGradient BoostingIntuitionProcedureTheory: Relation to GradientsUpdate RuleGradient Boosted Decision Trees (if time permits)

## Bagging (Bootstrap Aggregating)

Reduce variance w/o increasing bias

: Averaging reduce variance

E.g. Cross Validation: give more stable error estimate by averaging multiple independent folds

### Bootstrap Sampling

A method to sample from our sample data to simulate sampling from the population

- Use empirical distribution of our data to estimate the true unknown data-generating distribution

- Not all samples will be chosen and each could be chosen twice

- P(chosen) = 0.632

### Classification Tree Example

Average the predictions over a collection of bootstrap samples

#### Setup & Training

Number of trees =

Bootstrap sample size = original sample size

Train each tree on each bootstrap sample

#### Making Predictions

- Plurality vote over the predictions

- Predict with the combined probabilities
- More well behaved than 1.

### Out-of-Bag (OOB) Error Estimation

OOB Samples: remaining sample not contained in each bootstrap samples

⇒ See as test data

- Each sample is an OOB for trees

- Predict the response for the i-th observation

? When to record if a sample is OOB

## Random Forests

Decorrelates individual bagged trees by small tweaks

### Random Feature Selection

Essentially drop out a random subset of input features

At each node, select a random subset of predictors

Split along the best in the subset

In practice:

### Summary

### Analysis

Random forests are better predictors than bagged trees

## Boosting

A sequential (iterative) process:
Combine multiple week classifier to classify non-linearly-separable data

**Weak learner:**a classification model w/ accuracy little more than 50%

- If use a strong learner, will 100% overfit

- Associate a weight to each training sample:

- Loop until convergence:
- Train the weak learner on a bootstrap sample of the weighted training set
- Increase for misclassified sample ; decrease otherwise

#### Prediction

DO a weighted majority voting on trained classifiers

#### Intuition

### Adaboost

- = weight for model

- = classification error for model

- weight of sample computed from model 's classification errors

### Gradient Boosting

Fit a model on the residuals & add it to the origional

#### Intuition

Given a model non-perfect model F

- Cannot delete anything from model F

- Can only add additional model to F to get

by intuition

Use regression to find

#### Procedure

Training set = residuals of the model

Train on

New Model =

Repeat until satisified

#### Theory: Relation to Gradients

Squared Loss function

Minimize wrt gives

⇒ Residuals gradients

#### Update Rule

Can use other loss function