Cost Function Purpose
Goal: Tell us how well the model is performing Benefit: Helps improve the model by identifying areas for adjustment Application: Used across all types of machine learning models
The cost function is one of the most universal and important concepts in machine learning, used in both linear regression and training advanced AI models worldwide.
Cost Function Purpose
Goal: Tell us how well the model is performing Benefit: Helps improve the model by identifying areas for adjustment Application: Used across all types of machine learning models
Training Set: Contains input features x and output targets y Model: Linear function f_w,b(x) = wx + b Parameters: w and b (variables adjusted during training to improve the model)
Different values of w and b create different functions and different lines:
w = 0, b = 1.5: f(x) = 1.5 (horizontal line, constant prediction)
w = 0.5, b = 0: f(x) = 0.5x (line through origin)
w = 0.5, b = 1: f(x) = 0.5x + 1
Objective: Choose w and b so the straight line fits the data well Visual interpretation: Line should pass through or close to training examples
The cost function measures prediction accuracy by comparing predictions to actual values:
J(w,b) = 1/(2m) * Σ(f(x^(i)) - y^(i))²
Where:
Average instead of total: Dividing by m prevents the cost from automatically increasing with larger datasets Division by 2: Makes derivative calculations cleaner (cancels out later) Squared errors: Penalizes large errors more than small errors
J(w,b) = 1/(2m) * Σ(f(x^(i)) - y^(i))²
Since f(x^(i)) = wx^(i) + b, this emphasizes the relationship between parameters and predictions.
Minimize J(w,b): Find parameter values that result in the smallest possible cost Mathematical notation: min J(w,b) over w,b Interpretation: Smaller cost means better fit to training data
The squared error cost function is the most commonly used cost function for linear regression and many other regression problems, providing good results across diverse applications.
To build intuition about the cost function, we’ll use a simplified version of linear regression:
Simplification Benefits
Visualization: Easier to understand with 2D graphs instead of 3D Concepts: Same principles apply to full model with both w and b Goal: Minimize J(w) by finding optimal value of w
Training Examples: (1,1), (2,2), (3,3) Pattern: Perfect linear relationship where y = x
Function: f(x) = 1·x = x Predictions:
Cost calculation: J(1) = 1/(2·3) × (0² + 0² + 0²) = 0
Function: f(x) = 0.5x Predictions:
Cost calculation: J(0.5) = 1/(2·3) × (0.25 + 1 + 2.25) = 3.5/6 ≈ 0.58
Function: f(x) = 0 (horizontal line on x-axis) Predictions:
Cost calculation: J(0) = 1/(2·3) × (1 + 4 + 9) = 14/6 ≈ 2.33
Function: f(x) = -0.5x (downward sloping line) Result: Even higher cost around 5.25
Horizontal axis: Input x (house size) Vertical axis: Output y (price) Points: Training examples plotted as crosses Line: Function f(x) = wx for different values of w
Horizontal axis: Parameter w Vertical axis: Cost J(w) Points: Each w value corresponds to one point on cost curve Shape: U-shaped curve (bowl shape)
The systematic approach to finding the optimal w (and b in the full model) involves:
The cost function provides a systematic way to measure model performance. By understanding how different parameter values affect the cost, we can identify the best parameters for our model. The relationship between model function f(x) and cost function J(w) shows how parameter choices directly impact prediction quality and overall model performance.