Large-Scale Deep Learning
- Fast CPU implementation
 
- GPU implementation
 
- Large-scale distributed implementation
- Data parallelism
 
- Model parallelism
 
 
- Model Compression
 
- Dynamic structure. Data-processing systems can dynamically determine which subset of many neural networks should be run on a given input.
- Gater to pick expert network
 
- Switch formed from a hidden unit
 
 
- Specialized Hardware Implementation of Deep Networks
 
Computer Vision
- Preprocessing
- Contrast normalization - make sample standardized.
- Global contrast normalization(GCN) aims to prevent images from having varying amounts of contrast by subtracting the mean from each image, then rescaling it so that the standard deviation across its pixels is equal to some constants.
 
- Local contrast normalization
 
 
- Dataset augmentation
 
 
Speech Recognition
Nowadays use CNN
Natural Language Processing
N-Grams
$$
P (x_1, . . . , x_τ) = P (x_1, . . . , x_{n−1})\prod^τ_{t=n}P (x_t| x_{t−n+1}, . . . , x_{t−1})
$$
Neural Language Models
One example is word embeddings
High-Dimensional Outputs
- Use of a short list: limit to possible 10000-20000 words
 
- Hierarchical Softmax