Anomaly Detection Algorithm

Multi-Feature Setup

Training Data

Training set: x⁽¹⁾ through x⁽ᵐ⁾
Each example x has n features
Each x⁽ⁱ⁾ is a vector with n numbers

Example: Aircraft Engine

Two features: heat (x₁) and vibrations (x₂)
Each x⁽ⁱ⁾ is 2-dimensional vector
n = 2
For practical applications: n can be dozens or hundreds

Density Estimation Model

Model Assumption

Build model for p(x) where:

p(x) = p(x₁) × p(x₂) × p(x₃) × ... × p(xₙ)

Statistical Independence

Assumes features x₁, x₂, …, xₙ are statistically independent
Algorithm often works fine even when features aren’t actually independent
Understanding statistical independence not required for effective use

Compact Notation

p(x) = ∏(j=1 to n) p(xⱼ)

Where ∏ symbol means “product” (like Σ means “sum”)

Parameter Estimation

Individual Feature Parameters

For each feature j:

μⱼ: Mean of feature j
σⱼ²: Variance of feature j

Parameter Formulas

Mean estimation:

μⱼ = (1/m) * Σ(i=1 to m) xⱼ⁽ⁱ⁾

Variance estimation:

σⱼ² = (1/m) * Σ(i=1 to m) (xⱼ⁽ⁱ⁾ - μⱼ)²

Vectorized Implementation

For efficiency, can compute all means simultaneously:

μ = (1/m) * Σ(i=1 to m) x⁽ⁱ⁾

Where x and μ are both vectors.

Complete Algorithm

Step 1: Choose Features

Select features x⁽ⁱ⁾ that might be indicative of anomalous examples.

Step 2: Fit Parameters

Estimate μ₁ through μₙ and σ₁² through σₙ² for n features using formulas above.

Step 3: Compute Probability

For new example x, compute:

p(x) = ∏(j=1 to n) p(xⱼ; μⱼ, σⱼ²)

Substituting Gaussian formula:

p(x) = ∏(j=1 to n) (1/√(2πσⱼ)) * e^(-(xⱼ-μⱼ)²/(2σⱼ²))

Step 4: Make Decision

If p(x) < ε: Flag as anomaly
If p(x) ≥ ε: Consider normal

Algorithm Intuition

Why It Works

Algorithm flags example as anomalous if one or more features are either:

Very large relative to training set, OR
Very small relative to training set

Mathematical Reasoning

Each feature xⱼ fits a Gaussian distribution
If any feature xⱼ is “way out here” → p(xⱼ) becomes very small
If any single term in product is very small → overall p(x) becomes small
Systematic way to quantify unusually large or small feature values

Concrete Example

Dataset Characteristics

Feature x₁: Large range of values
Feature x₂: Smaller range of values

Fitted Parameters

μ₁ = 5, σ₁ = 2 (for x₁)
μ₂ = 3, σ₂ = 1 (for x₂)

3D Probability Surface

Multiplying p(x₁) × p(x₂) creates 3D surface
Height at any point = product of individual probabilities
Higher probability near center
Lower probability toward edges

Test Examples

Example 1 (x_test1):

p(x) ≈ 0.4 (much larger than ε = 0.02)
Decision: Normal, not anomalous

Example 2 (x_test2) (x₁ ≈ 8, x₂ ≈ 0.5):

p(x) ≈ 0.0021 (much smaller than ε = 0.02)
Decision: Flag as anomaly

This systematic approach provides a principled method for detecting anomalies by modeling normal behavior and identifying significant deviations from expected patterns.