What Is Clustering

What is Clustering?

Definition and Overview

A clustering algorithm looks at a number of data points and automatically finds data points that are related or similar to each other.

Supervised vs Unsupervised Learning Context

Supervised Learning (Review)

Given a dataset with features x₁ and x₂
Training set includes both input features x and labels y
Can fit logistic regression or neural network to learn decision boundary
Dataset includes both inputs x and target outputs y

Unsupervised Learning (Clustering)

Given dataset with just x, but not labels or target labels y
Plot shows just dots rather than two classes (x’s and o’s)
No target labels y available
Cannot tell algorithm what is the “right answer” to predict
Instead, ask algorithm to find something interesting about the data
Find some interesting structure about the data

What Clustering Does

Clustering looks for one particular type of structure in the data:

Tries to see if data can be grouped into clusters
Groups points that are similar to each other
Finds that dataset comprises data from multiple clusters

Applications of Clustering

News Article Grouping

Grouping similar news articles together
Example: stories about Pandas

Market Segmentation

At deeplearning.ai, discovered learners come for different reasons:
- Want to grow skills
- Develop careers
- Stay updated with AI and understand how it affects their field of work
Help everyone with any of these goals learn about machine learning

DNA Data Analysis

Look at genetic expression data from different individuals
Group them into people that exhibit similar traits

Astronomical Data Analysis

Astronomers use clustering for space exploration
Group bodies in space together for analysis
Figure out which ones form one galaxy
Determine which ones form coherent structures in space

Clustering represents a fundamental approach in unsupervised learning where we discover hidden patterns and structures in data without having predefined labels or target outputs to guide the learning process.