Finding Unusual Events

What is Anomaly Detection?

Anomaly detection algorithms look at an unlabeled dataset of normal events and learn to detect or raise a red flag if there is an unusual or anomalous event.

Aircraft Engine Example

Problem Setup

After aircraft engine rolls off assembly line, compute features:
- x₁ = heat generated by engine
- x₂ = vibration intensity
- Additional features as needed

Data Collection Challenge

Aircraft engine manufacturers don’t make many bad engines
Easier to collect data from m normal engines (most are fine)
Collect features x₁ and x₂ about how these m engines behave
Most examples are normal engines rather than defective ones

The Detection Task

Given:

m examples of normal engine behavior
New engine with feature vector X_test

Goal: Determine if new engine looks similar to previously manufactured ones or if there’s something suspicious that requires closer inspection.

How Anomaly Detection Works

Visual Example

Plot training examples x₁ through x_m as crosses (each point = specific engine with specific heat/vibration)
New engine appears near training data: Probably okay, looks similar to other engines
New engine appears far from training data: Likely anomaly, inspect more carefully before installation

Density Estimation Approach

Most common method uses density estimation:

Build probability model: Learn p(x) from training set
- Determine which feature values have high probability
- Identify which values have lower probability
Probability regions:
- Inner ellipse: High probability region
- Middle ellipse: Medium probability
- Outer ellipse: Lower probability
- Outside ellipses: Very low probability
Classification rule:
- Compute p(X_test) for new example
- If p(X_test) < ε (small threshold): Flag as anomaly
- If p(X_test) ≥ ε: Classify as normal

Applications of Anomaly Detection

Fraud Detection

Features for user behavior:

Login frequency
Number of web pages visited
Number of transactions made
Discussion forum posts
Typing speed (characters per second)

Process:

Model p(x) from data to understand typical user behavior
Don’t automatically disable suspicious accounts
Instead: request additional security verification
- Cell phone verification
- CAPTCHA challenges
- Identity confirmation

Use cases:

Fake account detection
Financial fraud identification (unusual purchase patterns)

Manufacturing Applications

Wide industry usage:

Aircraft engines
Printed circuit boards
Smartphones
Motors
Many other manufactured items

Purpose: Detect units that behave strangely, indicating potential defects before shipping to customers.

Computer System Monitoring

Features for machine monitoring:

Memory usage
Disk accesses per second
CPU load
Ratio features (e.g., CPU load to network traffic)

Detection targets:

Hardware failures (hard disk, network card)
Security breaches (hacking attempts)
Unusual system behavior

Real-World Examples

Telecommunications: Monitor cell towers for unusual behavior
Financial services: Detect fraudulent transactions
Manufacturing: Quality control for anomalous parts

Anomaly detection provides a systematic approach to identifying unusual events by learning what “normal” looks like and flagging significant deviations from typical patterns.