What Is Clustering
What is Clustering?
Section titled “What is Clustering?”Definition and Overview
Section titled “Definition and Overview”A clustering algorithm looks at a number of data points and automatically finds data points that are related or similar to each other.
Supervised vs Unsupervised Learning Context
Section titled “Supervised vs Unsupervised Learning Context”Supervised Learning (Review)
Section titled “Supervised Learning (Review)”- Given a dataset with features x₁ and x₂
- Training set includes both input features x and labels y
- Can fit logistic regression or neural network to learn decision boundary
- Dataset includes both inputs x and target outputs y
Unsupervised Learning (Clustering)
Section titled “Unsupervised Learning (Clustering)”- Given dataset with just x, but not labels or target labels y
- Plot shows just dots rather than two classes (x’s and o’s)
- No target labels y available
- Cannot tell algorithm what is the “right answer” to predict
- Instead, ask algorithm to find something interesting about the data
- Find some interesting structure about the data
What Clustering Does
Section titled “What Clustering Does”Clustering looks for one particular type of structure in the data:
- Tries to see if data can be grouped into clusters
- Groups points that are similar to each other
- Finds that dataset comprises data from multiple clusters
Applications of Clustering
Section titled “Applications of Clustering”News Article Grouping
Section titled “News Article Grouping”- Grouping similar news articles together
- Example: stories about Pandas
Market Segmentation
Section titled “Market Segmentation”- At deeplearning.ai, discovered learners come for different reasons:
- Want to grow skills
- Develop careers
- Stay updated with AI and understand how it affects their field of work
- Help everyone with any of these goals learn about machine learning
DNA Data Analysis
Section titled “DNA Data Analysis”- Look at genetic expression data from different individuals
- Group them into people that exhibit similar traits
Astronomical Data Analysis
Section titled “Astronomical Data Analysis”- Astronomers use clustering for space exploration
- Group bodies in space together for analysis
- Figure out which ones form one galaxy
- Determine which ones form coherent structures in space
Clustering represents a fundamental approach in unsupervised learning where we discover hidden patterns and structures in data without having predefined labels or target outputs to guide the learning process.