Principal Component Definition
In the PCA algorithm, the optimal axis is called the principal component - the axis that when you project data onto it, you end up with the largest possible amount of variance.
PCA works by finding new axes to represent your data. If you have a dataset with two features x₁ and x₂, initially your data is plotted using these axes. But to reduce features, you need to choose a new axis (z-axis) that effectively captures the data with fewer dimensions.
Before applying PCA, features should be normalized to have zero mean (subtract the mean from each feature).
If features take on very different scales, perform feature scaling before PCA. For example:
Without scaling, the large difference in ranges could affect PCA’s ability to find good axes.
Given five training examples, PCA must choose one axis instead of the original two to capture what’s important about the data.
Projection process:
If you choose an axis where projections result in points that are squished together:
If you choose an axis where projections result in points that are spread apart:
Principal Component Definition
In the PCA algorithm, the optimal axis is called the principal component - the axis that when you project data onto it, you end up with the largest possible amount of variance.
For a training example with coordinates (2, 3) and a principal component axis defined by vector [0.71, 0.71]:
Projection formula:
projection = dot_product([2, 3], [0.71, 0.71]) = 2 × 0.71 + 3 × 0.71 = 3.55
This means the distance from the origin to the projected point is 3.55, giving us one number to represent this example instead of two.
The principal component direction is represented as a length-1 vector pointing in the direction of the z-axis. In this example: [0.71, 0.71] (which is actually [0.707, 0.707] with more precision).
If you had 50 features and wanted three principal components:
Key Distinction
Linear regression predicts a target output y, while PCA reduces the number of axes needed to represent data well by treating all features equally.
Given a projected value (z = 3.55), you can approximate the original coordinates:
reconstruction = z × unit_vector = 3.55 × [0.71, 0.71] = [2.52, 2.52]
With just one number, you can get a reasonable approximation of the original two-dimensional coordinates.
PCA looks at original data and:
The result enables visualization and analysis of high-dimensional data in lower-dimensional spaces while maintaining the most important characteristics of the original dataset.