Created

Oct 5, 2021 06:08 PM

Topics

Dual Form Solution, Kernel Trick, Hinge Loss

Logistic Regression ReviewThe Perceptron ModelSupport Vector MachineNotationsFunctional Margin ScalingGeometric Margin Property 1Property 2Property 3Geometric Margin of a Particular ClassGeometric Margin of Both ClassesGeometric Margin for the Data SetMaximum Margin ClassifiersConstrained Optimization Problem1️⃣ Primer SVM Problem Statement Optimization OverviewUnconstrained OptimizationConstrained OptimizationLagrangian MultiplierDual Problem Statement Conditions for SolutionsKKT (Karush-Kuhn-Tucker) Conditions2️⃣ Dual SVM Problem Statement The Lagrangian is Given ByKKT ConditionsDual formulationSolutionUnderstanding the ConstraintsPredictions with SVMsNaïve ApproachBetter ApproachSummaryWhen Linear Separability Does Not ExistSolution 1: Find s.t. min # of constraints are violatedSolution 2: Minimize Slack PenaltyHinge LossIntuitionFormulationLoss Functions

## Logistic Regression Review

## Multi-Class Classification

Use Logistic Regression models to classify classes

#### Likelihood for a Single Sample (`Softmax`

Function)

The

`softmax`

function is a function that **turns a vector of****real values**into**a vector of****real values that sum to 1**.#### Likelihood of the Data

#### Loss Function (Cross-Entropy Loss)

- value of output to the j-th

## The Perceptron Model

Bias pushes the hyperplane toward the decision boundary

## Support Vector Machine

Seeks

**large margin**separator to improve generalization- Largest margin: farthest from the 2 classes

Uses

**optimization**to find efficient and optimal solutionsUses

**kernel trick**to make computations efficient specially in cases where the feature vectors is huge- Projects data to a high-dimensional space

### Notations

Input:

Output (labels):

Parameters:

Bias (Intercept):

### Functional Margin

Large margin → correct prediction & high confidence

Functional margin of a single training sample w.r.t.

- If then for to be large we need

- If the for to be large we need

Note that when the above is true, the prediction is also correct:

#### Scaling

Normalize the parameters to ensure without affecting the solution

- is invariant under scaling

- is not invariant under scaling

### Geometric Margin

Distance of from the hyperplane

#### Property 1

Unit vector hyper-plane

- For lying on the hyper-plane

#### Property 2

For any points on the hyper-plane we have

#### Property 3

**Signed distance**between any point to the hyper-plane is given by

#### Geometric Margin of a Particular Class

#### Geometric Margin of Both Classes

- A scaled version of
**Functional Margin**

- invariant to scaling of parameters

#### Geometric Margin for the Data Set

Distance to the closest point of either of the classes

### Maximum Margin Classifiers

**Maximize**the distance to the

**closest point**of either of the classes

- Discards other points, only care about the closest

Assumption: is linearly separable

#### Constrained Optimization Problem

**Constraints**

- Geometric margin ≥ : Make sure all points are further than → then maximize

- imposed to ensure Functional Margin = Geometric Margin
- simplifies equation
- L-2 norm

Simplify:

**Use Functional Margin Instead**- Remove constraint

- Modify Objective to

- (scaling constraint on )

### 1️⃣ Primer SVM Problem Statement

Goes directly for the

Simplified from the definition above

- Convex quadratic optimization with linear constraints

## Optimization Overview

### Unconstrained Optimization

### Constrained Optimization

Equality constraint ↔️ 2x inequality constraint

### Lagrangian Multiplier

Reduces a problem with variables with

**constraints**to an**unconstrained**problem of variables- Introduce a to each constraint

- Compute the linear combination of multipliers & constraints

- Optimize the wrt ↔️ evaluating at

If satisfies all the constraints

Solution to is the same as

#### Dual Problem Statement

A reduction of the primer problem

- always true

- sometimes true under certain conditions

#### Conditions for

- are convex

- is affine/linear:

- strictly possible:

#### Solutions

satisfying the KKT conditions

- solves the primal problem

- solves the dual problem

#### KKT (Karush-Kuhn-Tucker) Conditions

Complementarity Condition ➡️ ➡️ constraint is

**active**### 2️⃣ Dual SVM Problem Statement

#### The Lagrangian is Given By

#### KKT Conditions

#### Dual formulation

#### Solution

#### Understanding the Constraints

➡️ Either or

➡️

The hyperplane is defined by only the few closest data points

### Predictions with SVMs

#### Naïve Approach

Compute ; assign class +1 if positive, class -1 if negative

#### Better Approach

### Summary

## When Linear Separability Does Not Exist

All constraints are violated

⇒ Geometric margin loses its meaning

### Solution 1: Find s.t. min # of constraints are violated

⇒ NP-Hard Problem

### Solution 2: Minimize Slack Penalty

If functional margin is ≥ 1, the no penalty

If functional margin is < 1, then pay linear penalty

### Hinge Loss

This is the tightest convex upper bound of the intractable 0/1 loss

#### Intuition

⇒

**same**sign ⇒ correct classification ⇒ ⇒

**different**sign ⇒ incorrect classification ⇒#### Formulation

If then you have to separate the data ⇒ linearly separable solution only

If then completely ignore the data

Optimization over Hinge Loss with an regularization term