Skip to content
Pablo Rodriguez

Gradient Descent Implementation

Applying Gradient Descent to Logistic Regression

Section titled “Applying Gradient Descent to Logistic Regression”

Find parameters w and b that minimize the logistic regression cost function:

J(w,b) = -(1/m) * Σ[y⁽ⁱ⁾*log(f(x⁽ⁱ⁾)) + (1-y⁽ⁱ⁾)*log(1-f(x⁽ⁱ⁾))]
w_j := w_j - α * (∂J/∂w_j)
b := b - α * (∂J/∂b)

Where α is the learning rate and j goes from 1 to n (number of features).

Using calculus on the logistic cost function:

∂J/∂w_j = (1/m) * Σ[(f(x⁽ⁱ⁾) - y⁽ⁱ⁾) * x_j⁽ⁱ⁾]
∂J/∂b = (1/m) * Σ[(f(x⁽ⁱ⁾) - y⁽ⁱ⁾)]
  1. Compute all derivatives for current parameters
  2. Update all parameters simultaneously using:
    • w_j := w_j - α * (1/m) * Σ[(f(x⁽ⁱ⁾) - y⁽ⁱ⁾) * x_j⁽ⁱ⁾]
    • b := b - α * (1/m) * Σ[(f(x⁽ⁱ⁾) - y⁽ⁱ⁾)]

The gradient descent updates look exactly the same as linear regression:

  • Same derivative formulas
  • Same update structure
  • Same simultaneous update requirement

The crucial difference is in the definition of f(x):

Linear Regression

f(x) = w·x + b

Logistic Regression

f(x) = 1/(1 + e^(-(w·x + b)))

Even though the update equations appear identical, they represent completely different algorithms due to the different function definitions.

  • Use the same convergence monitoring techniques as linear regression
  • Plot cost function vs. iterations to verify decreasing cost
  • Check for appropriate learning rate selection

Feature scaling remains beneficial for logistic regression:

  • Scales features to similar ranges (e.g., -1 to +1)
  • Helps gradient descent converge faster
  • Prevents features with large scales from dominating

The algorithm can be vectorized for computational efficiency:

  • Implement matrix operations instead of loops
  • Significantly faster execution on large datasets
  • Same principles as vectorized linear regression

Gradient descent for logistic regression uses the same algorithmic structure as linear regression but with the sigmoid function defining f(x). The resulting updates look identical in form but solve a fundamentally different optimization problem, making logistic regression suitable for classification tasks.