Linear Regression
f(x) = w·x + b
Find parameters w and b that minimize the logistic regression cost function:
J(w,b) = -(1/m) * Σ[y⁽ⁱ⁾*log(f(x⁽ⁱ⁾)) + (1-y⁽ⁱ⁾)*log(1-f(x⁽ⁱ⁾))]
w_j := w_j - α * (∂J/∂w_j)b := b - α * (∂J/∂b)
Where α is the learning rate and j goes from 1 to n (number of features).
Using calculus on the logistic cost function:
∂J/∂w_j = (1/m) * Σ[(f(x⁽ⁱ⁾) - y⁽ⁱ⁾) * x_j⁽ⁱ⁾]
∂J/∂b = (1/m) * Σ[(f(x⁽ⁱ⁾) - y⁽ⁱ⁾)]
The gradient descent updates look exactly the same as linear regression:
The crucial difference is in the definition of f(x):
Linear Regression
f(x) = w·x + b
Logistic Regression
f(x) = 1/(1 + e^(-(w·x + b)))
Even though the update equations appear identical, they represent completely different algorithms due to the different function definitions.
Feature scaling remains beneficial for logistic regression:
The algorithm can be vectorized for computational efficiency:
Gradient descent for logistic regression uses the same algorithmic structure as linear regression but with the sigmoid function defining f(x). The resulting updates look identical in form but solve a fundamentally different optimization problem, making logistic regression suitable for classification tasks.