Machine Learning vs Pure Optimization

Optimization looks for global optima, ML might early stop
ML object function is normally a cost function minimizing the code across all training sample, while optmization uses only one example at a time

Challenges in Neural Network Optimization

Ill-conditioning of the Hessian matrix. This leads to the SGD to get stuck in very small steps.
Local Minima
Plateaus, saddle points and other flat regions
Cliffs and exploding gradients
Long-term dependencies - how to forget initial training sample
Inexact gradients - Jacob Matrix and Hessian matrix has noise
Poor correspondence between local and global structure
Theoretical limits of optimization

Basic Algorithms

Stochastic Gradient Descent

Untitled

Momentum

Untitled

Untitled

Nesterov Momentum