DL Menu


Gradient-Based Learning


As with other machine learning models, to apply gradient-based learning we must choose a cost function, and we must choose how to represent the output of the model. Largest difference between simple ML Models and neural networks are nonlinearity of a neural network causes most interesting loss functions to become non-convex. This means that neural networks are usually trained by using iterative, gradient-based optimizers that merely drive the cost function to a very low value, rather than exact linear equation solvers used to train linear regression models or the convex optimization algorithms used for logistic regression or SVMs.

Cost Functions

A cost function is an important parameter that determines how well a machine learning model performs for a given dataset. It calculates the difference between the expected value and predicted value and represents it as a single real number.

Types of Cost Function

  1. Regression Cost Function
    • Means Error
    • Mean Squared Error
    • Mean Absolute Error
  2. Binary Classification cost Functions
  3. Multi-class Classification Cost Function.

In most cases, our parametric model defines a distribution p(y | x;θ ) and we simply use the principle of maximum likelihood. This means we use the cross-entropy between the training data and the model’s predictions as the cost function.

Sometimes, we rather than predicting a complete probability distribution over y, we merely predict some statistic of y conditioned on x. Specialized loss functions allow us to train a predictor of these estimates.

The total cost function used to train a neural network will often combine one of the primary cost functions described here with a regularization term.

Learning Conditional Distributions with Maximum Likelihood

Most modern neural networks are trained using maximum likelihood. This meansthat the cost function is simply the negative log-likelihood, equivalently describedas the cross-entropy between the training data and the model distribution. This cost function is given by:

z⁢ = - 𝔼 x,y ~ p ̂ data log p model ( y | x ) .

The specific form of the cost function changes from model to model, depending on the specific form of log pmodel.

An advantage of this approach of deriving the cost function from maximum likelihood is that it removes the burden of designing cost functions for each model. Specifying a model p(y | x) automatically determines a cost function log p(y | x).


Next Topic :Output Units