Machine Learning vs Pure Optimization
- Optimization looks for global optima, ML might early stop
- ML object function is normally a cost function minimizing the code across all training sample, while optmization uses only one example at a time
Challenges in Neural Network Optimization
- Ill-conditioning of the Hessian matrix. This leads to the SGD to get stuck in very small steps.
- Local Minima
- Plateaus, saddle points and other flat regions
- Cliffs and exploding gradients
- Long-term dependencies - how to forget initial training sample
- Inexact gradients - Jacob Matrix and Hessian matrix has noise
- Poor correspondence between local and global structure
- Theoretical limits of optimization
Basic Algorithms
Stochastic Gradient Descent

Momentum


Nesterov Momentum