Anomaly Detection Algorithm
Anomaly Detection Algorithm
Section titled “Anomaly Detection Algorithm”Multi-Feature Setup
Section titled “Multi-Feature Setup”Training Data
Section titled “Training Data”- Training set: x⁽¹⁾ through x⁽ᵐ⁾
- Each example x has n features
- Each x⁽ⁱ⁾ is a vector with n numbers
Example: Aircraft Engine
Section titled “Example: Aircraft Engine”- Two features: heat (x₁) and vibrations (x₂)
- Each x⁽ⁱ⁾ is 2-dimensional vector
- n = 2
- For practical applications: n can be dozens or hundreds
Density Estimation Model
Section titled “Density Estimation Model”Model Assumption
Section titled “Model Assumption”Build model for p(x) where:
p(x) = p(x₁) × p(x₂) × p(x₃) × ... × p(xₙ)
Statistical Independence
Section titled “Statistical Independence”- Assumes features x₁, x₂, …, xₙ are statistically independent
- Algorithm often works fine even when features aren’t actually independent
- Understanding statistical independence not required for effective use
Compact Notation
Section titled “Compact Notation”p(x) = ∏(j=1 to n) p(xⱼ)
Where ∏ symbol means “product” (like Σ means “sum”)
Parameter Estimation
Section titled “Parameter Estimation”Individual Feature Parameters
Section titled “Individual Feature Parameters”For each feature j:
- μⱼ: Mean of feature j
- σⱼ²: Variance of feature j
Parameter Formulas
Section titled “Parameter Formulas”Mean estimation:
μⱼ = (1/m) * Σ(i=1 to m) xⱼ⁽ⁱ⁾
Variance estimation:
σⱼ² = (1/m) * Σ(i=1 to m) (xⱼ⁽ⁱ⁾ - μⱼ)²
Vectorized Implementation
Section titled “Vectorized Implementation”For efficiency, can compute all means simultaneously:
μ = (1/m) * Σ(i=1 to m) x⁽ⁱ⁾
Where x and μ are both vectors.
Complete Algorithm
Section titled “Complete Algorithm”Step 1: Choose Features
Section titled “Step 1: Choose Features”Select features x⁽ⁱ⁾ that might be indicative of anomalous examples.
Step 2: Fit Parameters
Section titled “Step 2: Fit Parameters”Estimate μ₁ through μₙ and σ₁² through σₙ² for n features using formulas above.
Step 3: Compute Probability
Section titled “Step 3: Compute Probability”For new example x, compute:
p(x) = ∏(j=1 to n) p(xⱼ; μⱼ, σⱼ²)
Substituting Gaussian formula:
p(x) = ∏(j=1 to n) (1/√(2πσⱼ)) * e^(-(xⱼ-μⱼ)²/(2σⱼ²))
Step 4: Make Decision
Section titled “Step 4: Make Decision”- If p(x) < ε: Flag as anomaly
- If p(x) ≥ ε: Consider normal
Algorithm Intuition
Section titled “Algorithm Intuition”Why It Works
Section titled “Why It Works”Algorithm flags example as anomalous if one or more features are either:
- Very large relative to training set, OR
- Very small relative to training set
Mathematical Reasoning
Section titled “Mathematical Reasoning”- Each feature xⱼ fits a Gaussian distribution
- If any feature xⱼ is “way out here” → p(xⱼ) becomes very small
- If any single term in product is very small → overall p(x) becomes small
- Systematic way to quantify unusually large or small feature values
Concrete Example
Section titled “Concrete Example”Dataset Characteristics
Section titled “Dataset Characteristics”- Feature x₁: Large range of values
- Feature x₂: Smaller range of values
Fitted Parameters
Section titled “Fitted Parameters”- μ₁ = 5, σ₁ = 2 (for x₁)
- μ₂ = 3, σ₂ = 1 (for x₂)
3D Probability Surface
Section titled “3D Probability Surface”- Multiplying p(x₁) × p(x₂) creates 3D surface
- Height at any point = product of individual probabilities
- Higher probability near center
- Lower probability toward edges
Test Examples
Section titled “Test Examples”Example 1 (x_test1):
- p(x) ≈ 0.4 (much larger than ε = 0.02)
- Decision: Normal, not anomalous
Example 2 (x_test2) (x₁ ≈ 8, x₂ ≈ 0.5):
- p(x) ≈ 0.0021 (much smaller than ε = 0.02)
- Decision: Flag as anomaly
This systematic approach provides a principled method for detecting anomalies by modeling normal behavior and identifying significant deviations from expected patterns.