#11: Ensemble Learning

    Nov 9, 2021 07:18 PM
    Bagging & Boosting

    Bagging (Bootstrap Aggregating)

    Reduce variance w/o increasing bias
    : Averaging reduce variance
    E.g. Cross Validation: give more stable error estimate by averaging multiple independent folds

    Bootstrap Sampling

    A method to sample from our sample data to simulate sampling from the population
    1. Use empirical distribution of our data to estimate the true unknown data-generating distribution
    1. Not all samples will be chosen and each could be chosen twice
    1. P(chosen) = 0.632

    Classification Tree Example

    Average the predictions over a collection of bootstrap samples

    Setup & Training

    Number of trees =
    Bootstrap sample size = original sample size
    Train each tree on each bootstrap sample

    Making Predictions

    1. Plurality vote over the predictions
    1. Predict with the combined probabilities
        • More well behaved than 1.

    Out-of-Bag (OOB) Error Estimation

    OOB Samples: remaining sample not contained in each bootstrap samples
    ⇒ See as test data
    • Each sample is an OOB for trees
    • Predict the response for the i-th observation
    ? When to record if a sample is OOB

    Random Forests

    Decorrelates individual bagged trees by small tweaks

    Random Feature Selection

    Essentially drop out a random subset of input features
    At each node, select a random subset of predictors
    Split along the best in the subset
    In practice:


    Random forests are better predictors than bagged trees
    A sequential (iterative) process: Combine multiple week classifier to classify non-linearly-separable data
    Weak learner: a classification model w/ accuracy little more than 50%
    • If use a strong learner, will 100% overfit
    1. Associate a weight to each training sample:
    1. Loop until convergence:
      1. Train the weak learner on a bootstrap sample of the weighted training set
      2. Increase for misclassified sample ; decrease otherwise


    DO a weighted majority voting on trained classifiers


    • = weight for model
    • = classification error for model
    • weight of sample computed from model 's classification errors

    Gradient Boosting

    Fit a model on the residuals & add it to the origional


    Given a model non-perfect model F
    1. Cannot delete anything from model F
    1. Can only add additional model to F to get
      1. by intuition
    Use regression to find


    Training set = residuals of the model
    Train on
    New Model =
    Repeat until satisified

    Theory: Relation to Gradients

    Squared Loss function
    Minimize wrt gives
    ⇒ Residuals gradients

    Update Rule

    Can use other loss function

    Gradient Boosted Decision Trees (if time permits)