Created

Nov 2, 2021 06:10 PM

Topics

+ Random Forest

Decision TreesMotivation and intuition behind treesCompositionFunction ClassWhat Makes a Good Decision TreeThe Greedy Learning AlgorithmChoosing the Best Variable to SplitTheorizing Choosing the Best VariableEntropy Conditional EntropyInformation GainKnow When to StopCase When Information Gain Choosing a Best Decision TreeHeuristics Toward Obtaining Simper TreesContinuous Variable Decision TreeDecision StumpsChoosing FormulasInterpretation of Decision TreesEnsemble Learning

## Decision Trees

### Motivation and intuition behind trees

### Composition

- Node: tests a single attribute

- Branch: one for each value of the attribute

- Leaf: assign a class to the sample

#### Function Class

A decision tree can represent any function of the input

### What Makes a Good Decision Tree

**Principle of Occam’s Razor**: the simpler the better

- Learning the simplest & smallest decision tree ⇒ NP-complete problem (exponential computational complexity, hard to solve)

### The Greedy Learning Algorithm

- Start with an empty tree

- Select the
**best possible attribute**to split on - Generate child nodes: one for each value

- Partition samples according to their values & assign to appropriate child node

- Recurse: repeat for each child node; return
**if all samples at a node are in the same class**

### Choosing the Best Variable to Split

More certainty about the classification

**Good fit:**all samples of different classes goes to different nodes

**Bad fit:**equal number of samples fall in both classes

### Theorizing Choosing the Best Variable

#### Entropy

The expected number of bits needed to encode a randomly drawn value of under the most efficient code

High: uniform-like distribution, less predictable

Low: has peeks, more predictable

#### Conditional Entropy

Conditioned on

#### Information Gain

Decrease in entropy after splitting at feature

Choose a variable that maximizes information gain

### Know When to Stop

- Biased toward finding a simpler tree

- If stop too late: all samples will have its own class

#### Case When Information Gain

Involve an

**over-split hyperparameter****Over-split**: randomly split & explore a few more levels

- Then
**prune**unused nodes

### Choosing a Best Decision Tree

Decision trees will overfit!

- Standard D-Tree: Bias = 0

- High variance: slight change → huge difference

#### Heuristics Toward Obtaining Simper Trees

- Lower bounding # of samples per leaf node

- Upper bounding depth

**Ensemble Learning (Random Forests)**

## Continuous Variable Decision Tree

**Decision Stumps**

Determine a threshold (

**Decision Stumps**) to split for variable- Branch 1:

- Branch 2:

Can allow repeated splits on the same variable w/ different thresholds

### Choosing

❌ search through all possible values

✔️ Sort data

- Consider splits at

- Only split at t with most IG

### Formulas

### Interpretation of Decision Trees

- May be readable but may not use human logic

- Need to use heuristics to avoid overfitting (depth limit, leaf bin size threshold, etc.)

- Very high variance: a small change in data will result in a completely different tree

## Ensemble Learning

Reduce variance in models by

- Train multiple models trained on the same data set

- Perform an average over predictions