DL Menu

Noise Robustness

Noise applied to inputs is a data augmentation, For some models addition of noise with extremely small variance at the input is equivalent to imposing a penalty on the norm of the weights.

Noise applied to hidden units, Noise injection can be much more powerful than simply shrinking the parameters. Noise applied to hidden units is so important that Dropout is the main development of this approach.

Adding Noise to Weights, This technique primarily used with Recurrent Neural Networks(RNNs). This can be interpreted as a stochastic implementation of Bayesian inference over the weights. Bayesian learning considers model weights to be uncertain and representable via a probability distribution p(w) that reflects that uncertainty. Adding noise to weights is a practical, stochastic way to reflect this uncertainty.

Noise applied to weights is equivalent to traditional regularization, encouraging stability. This can be seen in a regression setting, Train $\hat{y} (x)$ to map x to a scalar using least squares between model prediction $\hat{y} (x)$ and true values y.

J = 𝔼_{p(x,y)} [{(\hat{y} (x) - y)}^{2}]

The training set consists of m labeled examples {(x⁽¹⁾, y⁽¹⁾), . . . , (x^(m), y^(m))}. We perturb each input with $ε_{W} \sim Ν (ε; 0, η I)$ For small η, this is equivalent to a regularization term $η 𝔼_{p(x,y)} [{∥ \nabla w \hat{y} (x) ∥}^{2}]$ It encourages parameters to regions where small perturbations of weights have small influence on output

Injecting Noise at the Output Targets, Most datasets have some amount of mistakes in the y labels. It can be harmful to maximize log p(y | x) when y is a mistake. Most datasets have some amount of mistakes in the y labels. It can be harmful to maximize log p(y | x) when y is a mistake. This can be incorporated into the cost function, Ex: Local Smoothing regularizes a model based on a softmax with k output values by replacing the hard 0 and 1 classification targets with targets of ε/(k-1) and 1-ε respectively

Next Topic :Semi-Supervised Learning