Skip to content
Pablo Rodriguez

Visualization

When working with the full linear regression model f(w,b) = wx + b, the cost function J(w,b) creates a 3D surface plot.

3D Surface Plot Characteristics

Shape: Bowl-like surface (similar to soup bowl, dinner plate, or hammock) Axes: w and b as horizontal axes, J(w,b) as vertical axis Points: Each (w,b) combination corresponds to a point on the surface Height: Represents the cost value for those parameters

  • Location: Any point on surface represents specific w and b values
  • Example: w = -10, b = -15 corresponds to a point on the surface
  • Height: Vertical distance above that point equals J(-10, -15)
  • Global minimum: Single lowest point on the entire surface
  • Convex function: Technical term for bowl-shaped function
  • No local minima: Only one minimum exists (the global minimum)

Contour plots provide a 2D representation of the same 3D cost function, similar to topographical maps.

  • Mountain example: Mount Fuji topographical map shows elevation contours
  • Horizontal slices: Each contour line represents points at the same height
  • Aerial view: Looking down from above shows the contour pattern

Process: Take horizontal slices of the 3D bowl Result: Each slice becomes an oval/ellipse on the 2D plot Meaning: Points on same contour have identical cost values

Equal cost points: Any three points on the same contour line have identical J(w,b) values despite different w and b combinations

Minimum location: Center of concentric ovals represents the global minimum

Cost levels: Inner ovals represent lower costs, outer ovals represent higher costs

Imagine viewing the cost function as if:

  • Your monitor is lying flat on your desk
  • The bowl shape rises up from the screen
  • You’re looking down from above
  • Each oval represents a “height ring” on the bowl
  • Moving inward: Decreasing cost (better model performance)
  • Moving outward: Increasing cost (worse model performance)
  • Center target: Global minimum represents optimal parameters

Understanding these visualizations helps with:

  • Parameter selection: Identify regions of good performance
  • Algorithm behavior: See how optimization algorithms navigate the cost landscape
  • Model improvement: Understand relationship between parameter changes and performance

Both 3D surface plots and 2D contour plots represent the same mathematical relationship between parameters (w,b) and cost J(w,b), providing different perspectives for understanding how to find optimal model parameters.

Understanding how different parameter choices affect both the model function and cost helps build intuition for optimization.

Parameters: w ≈ -0.15, b ≈ 800

Function: f(x) = -0.15x + 800 Line characteristics:

  • Y-intercept at 800 (where line crosses vertical axis)
  • Negative slope of -0.15 (downward sloping) Performance: Not a good fit to the training data
  • Predictions vs. Actual: Many predictions are far from target values
  • Visual assessment: Line doesn’t follow the data pattern well
  • Cost location: Point on cost function is far from minimum
  • Cost value: High cost due to poor fit

Parameters: w = 0, b ≈ 360

Function: f(x) = 0x + 360 = 360 Line characteristics:

  • Horizontal line (flat)
  • Constant prediction of 360 regardless of input Performance: Still not great, but slightly less bad than Example 1
  • Constant prediction: Always predicts 360 for any house size
  • Cost improvement: Closer to minimum than previous example
  • Still suboptimal: Doesn’t capture size-price relationship

Parameters: Poor w and b combination

Result: Line that’s further from optimal compared to previous examples Cost position: Even further from the minimum Performance: Worse fit than the previous two examples

Parameters: Near-optimal w and b values

Function: f(x) with parameters close to optimal Performance: Pretty good fit to the training set Cost location: Very close to center of smallest ellipse (near global minimum)

  • Visual fit: Line passes close to most data points
  • Error measurement: Small vertical distances between data points and prediction line
  • Cost value: Very close to minimum possible cost
  • Sum of squared errors: Near the minimum among all possible straight lines

w (slope): Controls line steepness and direction b (y-intercept): Controls where line crosses vertical axis Combined effect: Together determine line position and orientation

For any parameter choice, you can measure fit quality by:

  1. Draw prediction line: Based on chosen w and b values
  2. Measure vertical distances: From each data point to the line
  3. Calculate squared errors: Square each distance to eliminate negatives
  4. Sum total error: Add all squared errors together

The optional lab following this section provides:

  • Cost function implementation: See the math in code
  • Interactive contour plots: Click anywhere to see corresponding line
  • 3D surface exploration: Rotate and examine the cost function
  • Parameter experimentation: Test different w and b combinations

Rather than manually trying different parameters:

  • Need efficient algorithm: Automatically find optimal w and b
  • Gradient descent: Algorithm that systematically finds the minimum
  • Next topic: Learn how gradient descent navigates the cost landscape

Understanding these examples shows why systematic optimization is necessary - manual parameter selection is inefficient and unlikely to find the true optimum.