Skip to content
Pablo Rodriguez

Running Gradient Descent

Observing gradient descent step-by-step shows how the algorithm systematically finds the optimal parameters for linear regression.

Display Components

Upper left: Model function f(x) and training data Upper right: Contour plot of cost function J(w,b) Bottom: 3D surface plot of the same cost function Real-time updates: All plots update simultaneously during optimization

Starting values: w = -0.1, b = 900 Initial function: f(x) = -0.1x + 900 Starting position: Point on cost function corresponding to these parameters

  • Parameter update: Move from initial point to new position on cost function
  • Direction: Down and to the right on contour plot
  • Function change: Slight modification to the line fit
  • Cost reduction: Lower cost value achieved
  • Continued movement: Further progress toward minimum
  • Line improvement: Better fit to the training data
  • Cost decrease: Additional reduction in cost function
  • Trajectory: Parameters follow curved path on contour plot
  • Convergence: Gradual approach to global minimum
  • Line evolution: Progressively better fits to data

Global minimum: Center of smallest contour ellipse Best fit line: Optimal straight line through the data Minimized cost: Lowest possible value for this dataset

Technical Term: Batch Gradient Descent

Definition: Uses all training examples in each update step Computation: Sum from i=1 to m in derivative calculations Alternative: Other versions use subsets of training data Standard choice: Batch version is most common for linear regression

  • Complete dataset: Looks at entire “batch” of training examples
  • Each iteration: Processes all m training examples
  • Contrast: Mini-batch or stochastic versions use fewer examples per update
  • Newsletter reference: “The Batch” by DeepLearning.AI named after this concept
  • Systematic improvement: Cost decreases with each iteration
  • Stable trajectory: Smooth path to minimum
  • Predictable behavior: Consistent convergence pattern
  • Optimal solution: Reaches global minimum reliably
  • Contour movement: Path shows algorithm navigating cost landscape
  • Cost reduction: Measurable decrease in J(w,b) each step
  • Line improvement: Visual confirmation of better data fit

The accompanying lab demonstrates:

  • Code implementation: See gradient descent algorithm in code
  • Cost tracking: Plot showing cost decrease over iterations
  • Contour visualization: Watch parameters move toward minimum
  • Interactive exploration: Understand algorithm behavior
  • Code familiarity: No writing required, just read and run
  • Algorithm understanding: See mathematical concepts in action
  • Implementation skills: Prepare for future coding tasks
  • Visual learning: Multiple perspectives on same algorithm

Achievement: Successfully implemented first machine learning algorithm Foundation: Understanding applies to more complex models Next steps: More powerful linear regression variations Skill development: Practical machine learning system design

Gradient descent systematically transforms initial parameter guesses into optimal values by following the cost function gradient. The visual demonstration shows how mathematical optimization translates to practical model improvement, creating a tool capable of making accurate predictions on new data.