Page cover image

#7: SVM, Optimization

    Created
    Oct 5, 2021 06:08 PM
    Topics
    Dual Form Solution, Kernel Trick, Hinge Loss

    Logistic Regression Review

    notion image

    Multi-Class Classification

    💡
    Use Logistic Regression models to classify classes

    Likelihood for a Single Sample (Softmax Function)

    The softmax function is a function that turns a vector of real values into a vector of real values that sum to 1.

    Likelihood of the Data

    Loss Function (Cross-Entropy Loss)

    • value of output to the j-th

    The Perceptron Model

    💡
    Bias pushes the hyperplane toward the decision boundary
    notion image

    Support Vector Machine

    💡
    Seeks large margin separator to improve generalization
    • Largest margin: farthest from the 2 classes
    🎯
    Uses optimization to find efficient and optimal solutions
    🛠
    Uses kernel trick to make computations efficient specially in cases where the feature vectors is huge
    • Projects data to a high-dimensional space
    notion image

    Notations

    Input:
    Output (labels):
    Parameters:
    Bias (Intercept):

    Functional Margin

    💡
    Large margin → correct prediction & high confidence
    Functional margin of a single training sample w.r.t.
    • If then for to be large we need
    • If the for to be large we need
    Note that when the above is true, the prediction is also correct:

    Scaling

    💡
    Normalize the parameters to ensure without affecting the solution
    • is invariant under scaling
    • is not invariant under scaling

    Geometric Margin

    Distance of from the hyperplane

    Property 1

    Unit vector hyper-plane
    • For lying on the hyper-plane

    Property 2

    For any points on the hyper-plane we have

    Property 3

    Signed distance between any point to the hyper-plane is given by
    notion image

    Geometric Margin of a Particular Class

    Geometric Margin of Both Classes

    • A scaled version of Functional Margin
    • invariant to scaling of parameters

    Geometric Margin for the Data Set

    💡
    Distance to the closest point of either of the classes

    Maximum Margin Classifiers

    💡
    Maximize the distance to the closest point of either of the classes
    • Discards other points, only care about the closest
    ‼️
    Assumption: is linearly separable

    Constrained Optimization Problem

    Constraints
    1. Geometric margin ≥ : Make sure all points are further than → then maximize
    1. imposed to ensure Functional Margin = Geometric Margin
        • simplifies equation
        • L-2 norm
    Simplify: Use Functional Margin Instead
    • Remove constraint
    • Modify Objective to
    • (scaling constraint on )

    1️⃣ Primer SVM Problem Statement

    Goes directly for the
    Simplified from the definition above
    • Convex quadratic optimization with linear constraints

    Optimization Overview

    Unconstrained Optimization

    Constrained Optimization

    Equality constraint ↔️ 2x inequality constraint

    Inequality Constraint

    Equality Constraint

    Lagrangian Multiplier

    🎯
    Reduces a problem with variables with constraints to an unconstrained problem of variables
    1. Introduce a to each constraint
    1. Compute the linear combination of multipliers & constraints
    1. Optimize the wrt ↔️ evaluating at
     
    If satisfies all the constraints
    Solution to is the same as

    Dual Problem Statement

    💡
    A reduction of the primer problem
    • always true
    • sometimes true under certain conditions

    Conditions for

    1. are convex
    1. is affine/linear:
    1. strictly possible:

    Solutions

    satisfying the KKT conditions
    • solves the primal problem
    • solves the dual problem

    KKT (Karush-Kuhn-Tucker) Conditions

    Complementarity Condition ➡️ ➡️ constraint is active

    2️⃣ Dual SVM Problem Statement

    The Lagrangian is Given By

    KKT Conditions

    Dual formulation

    Solution

    Understanding the Constraints

    ➡️ Either or
    ➡️
    💡
    The hyperplane is defined by only the few closest data points
    notion image

    Predictions with SVMs

    Naïve Approach

    Compute ; assign class +1 if positive, class -1 if negative

    Better Approach

    Summary

    Solution

    Prediction

    When Linear Separability Does Not Exist

    All constraints are violated
    ⇒ Geometric margin loses its meaning

    Solution 1: Find s.t. min # of constraints are violated

    notion image
    ⇒ NP-Hard Problem

    Solution 2: Minimize Slack Penalty

    If functional margin is ≥ 1, the no penalty
    If functional margin is < 1, then pay linear penalty
    notion image

    Hinge Loss

    This is the tightest convex upper bound of the intractable 0/1 loss
    notion image

    Intuition

    same sign ⇒ correct classification ⇒
    different sign ⇒ incorrect classification ⇒

    Formulation

    If then you have to separate the data ⇒ linearly separable solution only
    If then completely ignore the data
    Optimization over Hinge Loss with an regularization term
     

    Loss Functions

    Understanding loss functions : Hinge loss | by Kunal Chowdhury | Analytics Vidhya | Medium
    notion image
     
    notion image