Convex Function
Creates a convex cost surface, ensuring gradient descent finds the global minimum
For linear regression, we used the squared error cost function:
J(w,b) = (1/2m) * Σ(f(x) - y)²
However, when f(x) uses the sigmoid function for logistic regression, this creates a non-convex cost function with many local minima, making gradient descent unreliable.
Instead of squared error, we define a new loss function for a single training example:
If y = 1: Loss = -log(f(x)) If y = 0: Loss = -log(1 - f(x))
The loss function encourages the algorithm to output high probabilities for positive examples.
The loss function encourages the algorithm to output low probabilities for negative examples.
Convex Function
Creates a convex cost surface, ensuring gradient descent finds the global minimum
Penalizes Wrong Predictions
Large penalties for confident but incorrect predictions
Smooth Gradients
Provides smooth gradients for reliable optimization
This cost function is derived from the statistical principle of maximum likelihood estimation, which provides a principled way to find optimal parameters for probabilistic models.
The logistic loss function replaces squared error to create a convex optimization problem suitable for binary classification. By heavily penalizing confident but wrong predictions, it encourages the model to output appropriate probabilities for each class, leading to better classification performance.