Page cover image

#10: Decision Trees

    Nov 2, 2021 06:10 PM
    + Random Forest

    Decision Trees

    Motivation and intuition behind trees


    • Node: tests a single attribute
    • Branch: one for each value of the attribute
    • Leaf: assign a class to the sample

    Function Class

    A decision tree can represent any function of the input

    What Makes a Good Decision Tree

    Principle of Occam’s Razor: the simpler the better
    • Learning the simplest & smallest decision tree ⇒ NP-complete problem (exponential computational complexity, hard to solve)

    The Greedy Learning Algorithm

    1. Start with an empty tree
    1. Select the best possible attribute to split on
      1. Generate child nodes: one for each value
    1. Partition samples according to their values & assign to appropriate child node
    1. Recurse: repeat for each child node; return if all samples at a node are in the same class

    Choosing the Best Variable to Split

    More certainty about the classification
    • Good fit: all samples of different classes goes to different nodes
    • Bad fit: equal number of samples fall in both classes

    Theorizing Choosing the Best Variable


    The expected number of bits needed to encode a randomly drawn value of under the most efficient code
    High: uniform-like distribution, less predictable
    Low: has peeks, more predictable

    Conditional Entropy

    Conditioned on

    Information Gain

    Decrease in entropy after splitting at feature
    Choose a variable that maximizes information gain

    Know When to Stop

    • Biased toward finding a simpler tree
    • If stop too late: all samples will have its own class

    Case When Information Gain

    Involve an over-split hyperparameter
    • Over-split: randomly split & explore a few more levels
    • Then prune unused nodes

    Choosing a Best Decision Tree

    Decision trees will overfit!
    • Standard D-Tree: Bias = 0
    • High variance: slight change → huge difference

    Heuristics Toward Obtaining Simper Trees

    • Lower bounding # of samples per leaf node
    • Upper bounding depth
    • Ensemble Learning (Random Forests)

    Continuous Variable Decision Tree

    Decision Stumps

    Determine a threshold (Decision Stumps) to split for variable
    • Branch 1:
    • Branch 2:
    Can allow repeated splits on the same variable w/ different thresholds


    ❌ search through all possible values
    ✔️ Sort data
    • Consider splits at
    • Only split at t with most IG


    Interpretation of Decision Trees

    1. May be readable but may not use human logic
    1. Need to use heuristics to avoid overfitting (depth limit, leaf bin size threshold, etc.)
    1. Very high variance: a small change in data will result in a completely different tree

    Ensemble Learning

    Reduce variance in models by
    • Train multiple models trained on the same data set
    • Perform an average over predictions
    #11: Ensemble Learning