Skip to content
Pablo Rodriguez

Unsupervised Learning

Unsupervised learning works with data that isn’t associated with any output labels y. Instead of being given “right answers,” the algorithm must find structure, patterns, or interesting insights in the data on its own.

  • Data comes with both inputs x and output labels y
  • Example: Patient data with known benign/malignant classifications
  • Goal: Learn to predict labels for new data

The most common type of unsupervised learning is clustering, which groups similar data points together.

Google News

Process: Analyzes hundreds of thousands of daily news articles Result: Groups related stories together automatically Example: Articles about “giant panda gives birth to twin cubs” are clustered with other panda-related stories

DNA Analysis

Process: Analyzes genetic micro-array data Result: Groups individuals into different genetic types Data: Each column represents one person’s DNA, each row represents a gene

Market Segmentation

Process: Analyzes customer databases Result: Groups customers into different market segments Benefit: More efficient customer service strategies

The algorithm automatically discovers that articles mentioning similar words should be grouped together:

  • Common words found: “panda,” “twin,” “zoo”
  • No human supervision: No Google employee tells the algorithm which words to look for
  • Daily adaptation: Topics change daily, making manual classification impractical
  • Automatic discovery: The algorithm identifies clustering patterns without explicit guidance
  • Data structure: Grid-like spreadsheet format
  • Columns: Individual people’s genetic data
  • Rows: Specific genes (e.g., eye color, height, vegetable preferences)
  • Color coding: Red, green, gray show gene expression levels
  • Output: Groups like “Type 1,” “Type 2,” “Type 3” people based on genetic similarities

Beyond clustering, this specialization covers:

DeepLearning.AI’s analysis of their community revealed distinct learner groups:

  • Knowledge seekers: Primary motivation is growing skills
  • Career developers: Focus on promotions, new jobs, career progression
  • Industry updaters: Want to stay current on AI impacts in their field
  • Other motivations: Additional unique categories beyond the main three

Unsupervised learning algorithms:

  • Figure out patterns “all by themselves”
  • Don’t receive “right answers” during training
  • Must discover structure without explicit guidance
  • Adapt to changing data patterns automatically
  • Find hidden relationships in complex datasets

Understanding unsupervised learning helps recognize when data contains hidden structures that can be automatically discovered, even without knowing what to look for in advance.

Unsupervised vs Supervised Learning

Supervised Learning: Data comes with both inputs x and output labels y

Unsupervised Learning: Data comes only with inputs x, no output labels y

  • Algorithm must find structure, patterns, or interesting discoveries in data
  • No “right answers” provided during training

This specialization covers three types of unsupervised learning:

  • Purpose: Groups similar data points together
  • Example: Customer segmentation, DNA analysis, news article grouping
  • Process: Algorithm identifies natural groupings in data without predefined categories
  • Purpose: Detects unusual events or outliers
  • Key Application: Fraud detection in financial systems
    • Unusual transactions may indicate fraudulent activity
    • Critical for maintaining financial security
  • Other Uses: System monitoring, quality control, network security
  • Purpose: Compresses large datasets while preserving essential information
  • Benefit: “Almost magically” reduces data size with minimal information loss
  • Applications: Data visualization, storage optimization, preprocessing for other algorithms

Here are examples to help distinguish between supervised and unsupervised learning:

Type: Supervised Learning Reason: Has labeled data (spam vs. non-spam emails) Approach: Train model with known examples

Supervised Learning Characteristics:

  • Has target labels or “right answers”
  • Goal is to predict labels for new data
  • Examples: classification, regression problems

Unsupervised Learning Characteristics:

  • No target labels provided
  • Goal is to discover hidden structure
  • Examples: clustering, anomaly detection, dimensionality reduction

Understanding these types helps determine the appropriate approach for different data analysis problems:

  • Known outcomes: Use supervised learning
  • Unknown patterns: Use unsupervised learning
  • Unusual events: Consider anomaly detection
  • Complex high-dimensional data: Consider dimensionality reduction

The choice between supervised and unsupervised learning depends on whether you have labeled training data and what insights you’re trying to discover from your dataset.