$$ f(x; W , c, w, b) = w^\top max\{0, W^\top x + c\} + b $$
$$ W=\begin{bmatrix} 1 & 1\\ 1 & 1 \end{bmatrix} $$
$$ c=\begin{bmatrix} 0\\ -1\end{bmatrix} $$
$$ w=\begin{bmatrix} 1\\ -2 \end{bmatrix} $$
$$ b=0 $$
$$ J(θ) = −E_{x,y\sim \hat{p}{data}}log p{model}(y | x) $$
Output Unites:
Hidden Unites:
Logistic Sigmoid and Hyperbolic Tangent. These activation functions are closely related because tanh(z) = 2σ(2z) − 1
Radial basis function(RBF), unit: $hi=exp(−\frac{1}{σ^2_i}||W_{:,i}− x||^2).$ This function becomes more active as x approaches a template $W_{:, i}$. Because it saturates to 0 for most x, it can be difficult to optimize.
Softplus: $g(a) =ζ(a) =log(1+e^a)$. This is a smooth version of the rectifier. The use of the softplus is generally discouraged. The softplus demonstrates that the performance of hidden unit types can be very counterintuitive—one might expect it to have an advantage over the rectifier due to being differentiable everywhere or due to saturating less completely, but empirically it does not
Hard tanh. This is shaped similarly to the tanh and the rectifier, but unlike the latter, it is bounded, $g(a) =max(−1, min(1, a)).$


