Pca Algorithm

PCA Algorithm (Optional)

Algorithm Overview

PCA works by finding new axes to represent your data. If you have a dataset with two features x₁ and x₂, initially your data is plotted using these axes. But to reduce features, you need to choose a new axis (z-axis) that effectively captures the data with fewer dimensions.

Preprocessing Requirements

Mean Normalization

Before applying PCA, features should be normalized to have zero mean (subtract the mean from each feature).

Feature Scaling

If features take on very different scales, perform feature scaling before PCA. For example:

x₁: House size in square feet (1,000-3,000)
x₂: Number of bedrooms (1-5)

Without scaling, the large difference in ranges could affect PCA’s ability to find good axes.

Finding the Optimal Axis

The Projection Concept

Given five training examples, PCA must choose one axis instead of the original two to capture what’s important about the data.

Projection process:

Take each example and project it onto the chosen axis
Use line segments at 90-degree angles to the axis
The projection gives each example a single coordinate on the new axis

Evaluating Axis Quality

Poor Choice Example

If you choose an axis where projections result in points that are squished together:

The projected points have little variance
You capture much less information from the original dataset
The choice fails to preserve the data’s spread

Good Choice Example

If you choose an axis where projections result in points that are spread apart:

The projected points have large variance
You capture a lot of the variation and information in the original dataset
This preserves the essential characteristics of the data

Principal Component Definition

In the PCA algorithm, the optimal axis is called the principal component - the axis that when you project data onto it, you end up with the largest possible amount of variance.

Mathematical Formulation

Projection Calculation

For a training example with coordinates (2, 3) and a principal component axis defined by vector [0.71, 0.71]:

Projection formula:

projection = dot_product([2, 3], [0.71, 0.71])
         = 2 × 0.71 + 3 × 0.71
         = 3.55

This means the distance from the origin to the projected point is 3.55, giving us one number to represent this example instead of two.

Unit Vector Representation

The principal component direction is represented as a length-1 vector pointing in the direction of the z-axis. In this example: [0.71, 0.71] (which is actually [0.707, 0.707] with more precision).

Multiple Principal Components

Perpendicular Requirement

Second axis: Always at 90 degrees to the first axis
Third axis: At 90 degrees to both first and second axes
Additional axes: Each subsequent axis is perpendicular to all previous axes

High-Dimensional Extension

If you had 50 features and wanted three principal components:

First axis: Chosen to maximize variance
Second axis: Perpendicular to first, maximizes remaining variance
Third axis: Perpendicular to first two, maximizes remaining variance

PCA vs Linear Regression

Linear Regression (Supervised Learning)

Data: Features x and labels y
Goal: Fit line so predicted value is close to ground truth label y
Optimization: Minimize vertical distances (aligned with y-axis)
Special treatment: y gets special treatment as the target variable

PCA (Unsupervised Learning)

Data: Only features x₁, x₂, etc. (no labels y)
Goal: Find axis z that preserves data variance when projected
Optimization: Minimize projection distances (perpendicular to axis)
Equal treatment: All features (x₁, x₂, …, x₅₀) treated equally

Key Distinction

Linear regression predicts a target output y, while PCA reduces the number of axes needed to represent data well by treating all features equally.

Visual Comparison

Linear regression: Can only fit lines in one orientation (predicting y from x)
PCA: Can choose any orientation for the principal component based on data structure

Reconstruction Process

Reverse Transformation

Given a projected value (z = 3.55), you can approximate the original coordinates:

reconstruction = z × unit_vector
            = 3.55 × [0.71, 0.71]
            = [2.52, 2.52]

Approximation Quality

Original point: (2, 3)
Reconstructed point: (2.52, 2.52)
Approximation error: Small line segment between original and reconstructed points

With just one number, you can get a reasonable approximation of the original two-dimensional coordinates.

Algorithm Summary

PCA looks at original data and:

Chooses new axes (z or z₁, z₂, etc.) to represent data
Projects original data onto these new axes
Provides smaller set of numbers for plotting and visualization
Maximizes information retention by preserving variance

The result enables visualization and analysis of high-dimensional data in lower-dimensional spaces while maintaining the most important characteristics of the original dataset.