Skip to content
Pablo Rodriguez

Supervised Learning

Supervised learning refers to algorithms that learn x to y or input to output mappings. The key characteristic is providing the learning algorithm with examples that include the right answers - correct labels y for given inputs x.

By seeing correct pairs of input x and desired output label y, the learning algorithm eventually learns to take just the input alone and give reasonably accurate predictions.

Spam Filtering

Input: Email Output: Spam or not spam classification

Speech Recognition

Input: Audio clip Output: Text transcript

Machine Translation

Input: English text Output: Spanish, Arabic, Hindi, Chinese, Japanese, etc.

Online Advertising

Input: Ad information + user information Output: Probability of clicking the ad Most lucrative form of supervised learning today

  • Self-driving cars: Input images and sensor data to output positions of other cars
  • Manufacturing: Visual inspection using product images to detect scratches, dents, or defects

The training process follows these steps:

  1. Train the model with examples of inputs x and correct answers (labels y)
  2. After learning from input-output pairs, the model can take new inputs it has never seen
  3. The model produces appropriate corresponding outputs

Consider predicting housing prices based on house size:

  • Data collection: Plot house sizes (square feet) vs. prices (thousands of dollars)
  • Prediction goal: Determine price for a 750 square foot house
  • Approach options:
    • Fit a straight line to the data
    • Fit a curve for potentially better accuracy

The algorithm must systematically choose the most appropriate function to fit the data, rather than selecting based on desired outcomes.

  • Predicts numerical values (e.g., 150,000 or 183,000)
  • Output can be any number within a range
  • Housing price prediction is a regression problem
  • Predicts categories or discrete classes
  • Limited set of possible outputs
  • Categories can be non-numeric (cat vs. dog) or numeric (0, 1, 2)
  • When numbers are used, they represent discrete categories, not continuous values

The key distinction is that regression predicts from infinite possibilities while classification predicts from a finite set of discrete categories.

Supervised learning maps input x to output y, learning from examples with correct answers. The two major types are regression (predicting numbers) and classification (predicting categories), each suited for different types of prediction problems.

Classification addresses problems where the goal is to predict a small number of possible output categories. Consider breast cancer detection as a diagnostic tool:

  • Goal: Determine if a tumor (lump) is malignant (cancerous/dangerous) or benign (not cancerous)
  • Input: Patient’s medical records
  • Output: Binary classification (0 = benign, 1 = malignant)
  • Tumors of various sizes are labeled as either:
    • Benign (0): Not dangerous
    • Malignant (1): Cancerous and dangerous
  • Data can be plotted with tumor size on horizontal axis and binary classification (0/1) on vertical axis

Classification can extend beyond binary problems:

  • Multiple cancer types: Type 1, Type 2, etc.
  • Three possible outputs: Benign, Type 1 cancer, Type 2 cancer
  • Terms “output classes” and “output categories” are used interchangeably

Instead of just tumor size, consider both:

  • Age (in years)
  • Tumor size

With two inputs, data visualization shows:

  • Circles (○): Benign tumors
  • Crosses (×): Malignant tumors
  • Boundary line: Learning algorithm finds a decision boundary separating malignant from benign cases

Real breast cancer detection systems use many additional inputs:

  • Thickness of tumor clump
  • Uniformity of cell size
  • Uniformity of cell shape
  • And many other medical features

Classification Properties

  • Categories don’t have to be numbers: Can predict “cat” vs “dog”
  • Numbers as categories: When used (0, 1, 2), they represent discrete categories, not continuous values
  • Limited output set: Finite number of possible predictions
  • No in-between values: Unlike regression, doesn’t predict values like 0.5 or 1.7

The learning algorithm determines how to fit a boundary through the data to separate different classes. This boundary helps make predictions for new patients based on their age and tumor characteristics.

Classification algorithms predict categories from a small, finite set of possible outputs. Whether dealing with two categories (benign/malignant) or multiple categories, the goal is to learn decision boundaries that can accurately classify new examples based on their input features. The key difference from regression is the discrete, limited nature of possible outputs rather than continuous numerical predictions.