Skip to content
Pablo Rodriguez

Logistic Regression

The Problem with Linear Regression’s Cost Function

Section titled “The Problem with Linear Regression’s Cost Function”

For linear regression, we used the squared error cost function:

J(w,b) = (1/2m) * Σ(f(x) - y)²

However, when f(x) uses the sigmoid function for logistic regression, this creates a non-convex cost function with many local minima, making gradient descent unreliable.

Instead of squared error, we define a new loss function for a single training example:

If y = 1: Loss = -log(f(x)) If y = 0: Loss = -log(1 - f(x))

  • If f(x) ≈ 1 (correct prediction): Loss ≈ 0 (very small penalty)
  • If f(x) = 0.5: Loss is moderate
  • If f(x) ≈ 0 (wrong prediction): Loss → ∞ (very large penalty)

The loss function encourages the algorithm to output high probabilities for positive examples.

  • If f(x) ≈ 0 (correct prediction): Loss ≈ 0 (very small penalty)
  • If f(x) = 0.5: Loss is moderate
  • If f(x) ≈ 1 (wrong prediction): Loss → ∞ (very large penalty)

The loss function encourages the algorithm to output low probabilities for negative examples.

Convex Function

Creates a convex cost surface, ensuring gradient descent finds the global minimum

Penalizes Wrong Predictions

Large penalties for confident but incorrect predictions

Smooth Gradients

Provides smooth gradients for reliable optimization

This cost function is derived from the statistical principle of maximum likelihood estimation, which provides a principled way to find optimal parameters for probabilistic models.

The logistic loss function replaces squared error to create a convex optimization problem suitable for binary classification. By heavily penalizing confident but wrong predictions, it encourages the model to output appropriate probabilities for each class, leading to better classification performance.