Running Gradient Descent

Algorithm in Action

Observing gradient descent step-by-step shows how the algorithm systematically finds the optimal parameters for linear regression.

Visualization Setup

Display Components

Upper left: Model function f(x) and training data Upper right: Contour plot of cost function J(w,b) Bottom: 3D surface plot of the same cost function Real-time updates: All plots update simultaneously during optimization

Initial Parameters

Starting values: w = -0.1, b = 900 Initial function: f(x) = -0.1x + 900 Starting position: Point on cost function corresponding to these parameters

Step-by-Step Evolution

Step 1

Parameter update: Move from initial point to new position on cost function
Direction: Down and to the right on contour plot
Function change: Slight modification to the line fit
Cost reduction: Lower cost value achieved

Step 2

Continued movement: Further progress toward minimum
Line improvement: Better fit to the training data
Cost decrease: Additional reduction in cost function

Subsequent Steps

Trajectory: Parameters follow curved path on contour plot
Convergence: Gradual approach to global minimum
Line evolution: Progressively better fits to data

Global minimum: Center of smallest contour ellipse Best fit line: Optimal straight line through the data Minimized cost: Lowest possible value for this dataset

Batch Gradient Descent

Technical Term: Batch Gradient Descent

Definition: Uses all training examples in each update step Computation: Sum from i=1 to m in derivative calculations Alternative: Other versions use subsets of training data Standard choice: Batch version is most common for linear regression

Why “Batch”?

Complete dataset: Looks at entire “batch” of training examples
Each iteration: Processes all m training examples
Contrast: Mini-batch or stochastic versions use fewer examples per update
Newsletter reference: “The Batch” by DeepLearning.AI named after this concept

Algorithm Performance

Convergence Characteristics

Systematic improvement: Cost decreases with each iteration
Stable trajectory: Smooth path to minimum
Predictable behavior: Consistent convergence pattern
Optimal solution: Reaches global minimum reliably

Visual Indicators

Contour movement: Path shows algorithm navigating cost landscape
Cost reduction: Measurable decrease in J(w,b) each step
Line improvement: Visual confirmation of better data fit

Optional Lab Features

The accompanying lab demonstrates:

Code implementation: See gradient descent algorithm in code
Cost tracking: Plot showing cost decrease over iterations
Contour visualization: Watch parameters move toward minimum
Interactive exploration: Understand algorithm behavior

Lab Benefits

Code familiarity: No writing required, just read and run
Algorithm understanding: See mathematical concepts in action
Implementation skills: Prepare for future coding tasks
Visual learning: Multiple perspectives on same algorithm

Course Completion

Achievement: Successfully implemented first machine learning algorithm Foundation: Understanding applies to more complex models Next steps: More powerful linear regression variations Skill development: Practical machine learning system design

Summary

Gradient descent systematically transforms initial parameter guesses into optimal values by following the cost function gradient. The visual demonstration shows how mathematical optimization translates to practical model improvement, creating a tool capable of making accurate predictions on new data.