Created

Oct 19, 2021 06:24 PM

Topics

XOR, Backpropagation Algorithm

Lecture OutlineArtificial NeuronOR vs XORActivation FunctionsTwo-Layer Neural NetworkModelWeight MatrixGeneral Neural NetworkUniversal ApproximatorTraining Neural NetworksRegression TaskLikelihood functionLoss FunctionClassification TaskCross Entropy LossTraining Using Gradient DescentTraining Using Backpropagation Algorithm➡️ Forward PassExample: Single Neuron Per Layer⬅️ Backward PassRegularizationWeight DecayEarly stoppingDropoutConvolutional Neural Networks (CNNs)Motivation

## Lecture Outline

- Motivation behind neural networks

- The XOR problem

- Multi-Layer Perceptron model

- Backpropagation algorithm

- Regularization and other stuff in NNs

## Artificial Neuron

## OR vs XOR

TODO

## Activation Functions

Introduce non-linearity to the model

- Sigmoid

- ReLU

Choosing activation function: an art not a science

- Stick to one activation function for all layers

### Two-Layer Neural Network

Single-hidden-layer network

Potentially overfits the dataset

Can simulate any line (linear or non-linear)

- number of hidden layers (depth of network ) ⇒ number of hyperplanes
- Deep learning: many hidden layers

- any activation function (sigmoid, ReLU, etc.)

For regression tasks: , identity function

For binary classification task:

#### Model

#### Weight Matrix

Shape of

- rows ⇒ # neurons in the next layer

- columns ⇒ # neurons in the current layer

Each row in corresponds to the output of the next layer

## General Neural Network

A directed acyclic graph from left to right

k = unit for which compute activation for

j = input for unit k

### Universal Approximator

A two layer neural network can approximate any function

## Training Neural Networks

### Regression Task

#### Likelihood function

#### Loss Function

### Classification Task

#### Cross Entropy Loss

### Training Using Gradient Descent

Gradient is hard to find due to model complexity

## Training Using Backpropagation Algorithm

Message passing algo with 2 steps:

### ➡️ Forward Pass

starting from the inputs successively compute the activations of each unit

- Apply input to the network

- Compute the activations of all units (hidden & output)

- Evaluate for all output units
- ground truth
- prediction of the model

#### Example: Single Neuron Per Layer

**Structure**

**Formulas**

### ⬅️ Backward Pass

successively compute the gradients of the error function with respect to the activations and the weights

- First compute the gradient of the output

- Backpropagate to compute for all hidden units

Given and

- Evaluate the derivatives w.r.t. weights :

## Regularization

### Weight Decay

### Early stopping

Stop training as soon as the validation error starts increasing

*Training Loss*

*Validation Loss*

⇒ Equivalent to some kind of weight decay

### Dropout

During each training iteration, remove from the model (turn off) each unit with a probability of

turned off ⇒

- Validate & test using the full model, scale each unit by the frequency at which it is on

## Convolutional Neural Networks (CNNs)

### Motivation

Neural Network Convention

**Problem:**Output should be

**invariant**to translation, rotation, scaling, distortion, elastic deformation, other arbitrary artifacts

**Naïve Solution**: Fully Connected Networks w/ a large & diverse data to obtain invariance

- Each unit of each layer sees the whole image

- Ignores the key property of images:
**locality****-**near by pixels tend to have similar values

- Neural Networks are
**brittle**⚠️: small variances → big effect on output

- Information can be merged at later stages to get higher order features about the whole image