Supervised Learning

Core Definition

Supervised learning refers to algorithms that learn x to y or input to output mappings. The key characteristic is providing the learning algorithm with examples that include the right answers - correct labels y for given inputs x.

By seeing correct pairs of input x and desired output label y, the learning algorithm eventually learns to take just the input alone and give reasonably accurate predictions.

Common Applications

Spam Filtering

Input: Email Output: Spam or not spam classification

Speech Recognition

Input: Audio clip Output: Text transcript

Machine Translation

Input: English text Output: Spanish, Arabic, Hindi, Chinese, Japanese, etc.

Online Advertising

Input: Ad information + user information Output: Probability of clicking the ad Most lucrative form of supervised learning today

Additional Applications

Self-driving cars: Input images and sensor data to output positions of other cars
Manufacturing: Visual inspection using product images to detect scratches, dents, or defects

Training Process

The training process follows these steps:

Train the model with examples of inputs x and correct answers (labels y)
After learning from input-output pairs, the model can take new inputs it has never seen
The model produces appropriate corresponding outputs

Housing Price Example

Consider predicting housing prices based on house size:

Data collection: Plot house sizes (square feet) vs. prices (thousands of dollars)
Prediction goal: Determine price for a 750 square foot house
Approach options:
- Fit a straight line to the data
- Fit a curve for potentially better accuracy

The algorithm must systematically choose the most appropriate function to fit the data, rather than selecting based on desired outcomes.

Regression vs Classification

Regression Characteristics

Predicts numerical values (e.g., 150,000 or 183,000)
Output can be any number within a range
Housing price prediction is a regression problem

Classification Characteristics

Predicts categories or discrete classes
Limited set of possible outputs
Categories can be non-numeric (cat vs. dog) or numeric (0, 1, 2)
When numbers are used, they represent discrete categories, not continuous values

The key distinction is that regression predicts from infinite possibilities while classification predicts from a finite set of discrete categories.

Summary

Supervised learning maps input x to output y, learning from examples with correct answers. The two major types are regression (predicting numbers) and classification (predicting categories), each suited for different types of prediction problems.

Classification Examples

Breast Cancer Detection Example

Classification addresses problems where the goal is to predict a small number of possible output categories. Consider breast cancer detection as a diagnostic tool:

Problem Setup

Goal: Determine if a tumor (lump) is malignant (cancerous/dangerous) or benign (not cancerous)
Input: Patient’s medical records
Output: Binary classification (0 = benign, 1 = malignant)

Data Representation

Tumors of various sizes are labeled as either:
- Benign (0): Not dangerous
- Malignant (1): Cancerous and dangerous
Data can be plotted with tumor size on horizontal axis and binary classification (0/1) on vertical axis

Multiple Categories

Classification can extend beyond binary problems:

Multiple cancer types: Type 1, Type 2, etc.
Three possible outputs: Benign, Type 1 cancer, Type 2 cancer
Terms “output classes” and “output categories” are used interchangeably

Multiple Input Features

Two-Input Example

Instead of just tumor size, consider both:

Age (in years)
Tumor size

With two inputs, data visualization shows:

Circles (○): Benign tumors
Crosses (×): Malignant tumors
Boundary line: Learning algorithm finds a decision boundary separating malignant from benign cases

Practical Applications

Real breast cancer detection systems use many additional inputs:

Thickness of tumor clump
Uniformity of cell size
Uniformity of cell shape
And many other medical features

Key Characteristics

Classification Properties

Categories don’t have to be numbers: Can predict “cat” vs “dog”
Numbers as categories: When used (0, 1, 2), they represent discrete categories, not continuous values
Limited output set: Finite number of possible predictions
No in-between values: Unlike regression, doesn’t predict values like 0.5 or 1.7

Decision Boundaries

The learning algorithm determines how to fit a boundary through the data to separate different classes. This boundary helps make predictions for new patients based on their age and tumor characteristics.

Summary

Classification algorithms predict categories from a small, finite set of possible outputs. Whether dealing with two categories (benign/malignant) or multiple categories, the goal is to learn decision boundaries that can accurately classify new examples based on their input features. The key difference from regression is the discrete, limited nature of possible outputs rather than continuous numerical predictions.