Primary Use Case
PCA is commonly used by data scientists to visualize the data, to figure out what might be going on in their datasets.
Principal Component Analysis (PCA) is an unsupervised learning algorithm commonly used for visualization. If you have a dataset with a lot of features - say 10 features or 50 features or even thousands of features - you can’t plot 1,000 dimensional data.
PCA is an algorithm that lets you take data with a lot of features (50, 1,000, even more) and reduce the number of features to two features, maybe three features, so that you can plot it and visualize it.
Primary Use Case
PCA is commonly used by data scientists to visualize the data, to figure out what might be going on in their datasets.
To illustrate PCA, consider a dataset from a collection of passenger cars with many features:
The question is: how can you use PCA to reduce the number of features for visualization?
In most countries, because of road width constraints, car width tends not to vary much. Most cars are about 1.8 meters wide (just under six feet). If you plot length vs width, x₁ varies quite a bit while x₂ varies relatively little.
For feature reduction, you could simply take x₁ since x₂ varies little from car to car. PCA will more or less automatically decide to just take x₁.
Again, PCA would essentially choose the feature x₁ when applied to this dataset.
This presents a more interesting case where some cars are bigger (longer and taller) and some cars are smaller (not as long and not as tall).
For feature reduction, you don’t want to pick just x₁ and ignore x₂, nor pick just x₂ and ignore x₁, since both have useful information.
Instead of being limited to the x₁ axis or x₂ axis, PCA introduces a new axis called the z-axis. This z-axis:
The idea of PCA is to find one or more new axes (such as z) so that when you measure your data’s coordinates on the new axis, you end up with very useful information about the items (cars in this example).
Instead of needing two numbers (coordinates on x₁ and x₂ axes for length and height), you now need fewer numbers - in this case, only one number instead of two - to capture roughly the size of the car.
In practice, PCA is usually used to reduce a very large number of features:
Consider data about different countries with 50 features:
PCA can compress these 50 features down to two features (z₁ and z₂) for visualization:
Possible interpretations:
Country examples:
This approach lets you take 50-dimensional data (50 features) and reduce it to 2-dimensional data, enabling you to:
Data Exploration Value
Whenever working with a new dataset, visualizing the data helps understand what the data looks like and can reveal if something unexpected is happening in the dataset.
PCA provides a powerful way to take high-dimensional data and make it understandable through visualization, helping data scientists gain insights that would be impossible to see in the original high-dimensional space.