Skip to content
Pablo Rodriguez

What Is Clustering

A clustering algorithm looks at a number of data points and automatically finds data points that are related or similar to each other.

Supervised vs Unsupervised Learning Context

Section titled “Supervised vs Unsupervised Learning Context”
  • Given a dataset with features x₁ and x₂
  • Training set includes both input features x and labels y
  • Can fit logistic regression or neural network to learn decision boundary
  • Dataset includes both inputs x and target outputs y
  • Given dataset with just x, but not labels or target labels y
  • Plot shows just dots rather than two classes (x’s and o’s)
  • No target labels y available
  • Cannot tell algorithm what is the “right answer” to predict
  • Instead, ask algorithm to find something interesting about the data
  • Find some interesting structure about the data

Clustering looks for one particular type of structure in the data:

  • Tries to see if data can be grouped into clusters
  • Groups points that are similar to each other
  • Finds that dataset comprises data from multiple clusters
  • Grouping similar news articles together
  • Example: stories about Pandas
  • At deeplearning.ai, discovered learners come for different reasons:
    • Want to grow skills
    • Develop careers
    • Stay updated with AI and understand how it affects their field of work
  • Help everyone with any of these goals learn about machine learning
  • Look at genetic expression data from different individuals
  • Group them into people that exhibit similar traits
  • Astronomers use clustering for space exploration
  • Group bodies in space together for analysis
  • Figure out which ones form one galaxy
  • Determine which ones form coherent structures in space

Clustering represents a fundamental approach in unsupervised learning where we discover hidden patterns and structures in data without having predefined labels or target outputs to guide the learning process.