Finding Unusual Events
Finding Unusual Events
Section titled “Finding Unusual Events”What is Anomaly Detection?
Section titled “What is Anomaly Detection?”Anomaly detection algorithms look at an unlabeled dataset of normal events and learn to detect or raise a red flag if there is an unusual or anomalous event.
Aircraft Engine Example
Section titled “Aircraft Engine Example”Problem Setup
Section titled “Problem Setup”- After aircraft engine rolls off assembly line, compute features:
- x₁ = heat generated by engine
- x₂ = vibration intensity
- Additional features as needed
Data Collection Challenge
Section titled “Data Collection Challenge”- Aircraft engine manufacturers don’t make many bad engines
- Easier to collect data from m normal engines (most are fine)
- Collect features x₁ and x₂ about how these m engines behave
- Most examples are normal engines rather than defective ones
The Detection Task
Section titled “The Detection Task”Given:
- m examples of normal engine behavior
- New engine with feature vector X_test
Goal: Determine if new engine looks similar to previously manufactured ones or if there’s something suspicious that requires closer inspection.
How Anomaly Detection Works
Section titled “How Anomaly Detection Works”Visual Example
Section titled “Visual Example”- Plot training examples x₁ through x_m as crosses (each point = specific engine with specific heat/vibration)
- New engine appears near training data: Probably okay, looks similar to other engines
- New engine appears far from training data: Likely anomaly, inspect more carefully before installation
Density Estimation Approach
Section titled “Density Estimation Approach”Most common method uses density estimation:
-
Build probability model: Learn p(x) from training set
- Determine which feature values have high probability
- Identify which values have lower probability
-
Probability regions:
- Inner ellipse: High probability region
- Middle ellipse: Medium probability
- Outer ellipse: Lower probability
- Outside ellipses: Very low probability
-
Classification rule:
- Compute p(X_test) for new example
- If p(X_test) < ε (small threshold): Flag as anomaly
- If p(X_test) ≥ ε: Classify as normal
Applications of Anomaly Detection
Section titled “Applications of Anomaly Detection”Fraud Detection
Section titled “Fraud Detection”Features for user behavior:
- Login frequency
- Number of web pages visited
- Number of transactions made
- Discussion forum posts
- Typing speed (characters per second)
Process:
- Model p(x) from data to understand typical user behavior
- Don’t automatically disable suspicious accounts
- Instead: request additional security verification
- Cell phone verification
- CAPTCHA challenges
- Identity confirmation
Use cases:
- Fake account detection
- Financial fraud identification (unusual purchase patterns)
Manufacturing Applications
Section titled “Manufacturing Applications”Wide industry usage:
- Aircraft engines
- Printed circuit boards
- Smartphones
- Motors
- Many other manufactured items
Purpose: Detect units that behave strangely, indicating potential defects before shipping to customers.
Computer System Monitoring
Section titled “Computer System Monitoring”Features for machine monitoring:
- Memory usage
- Disk accesses per second
- CPU load
- Ratio features (e.g., CPU load to network traffic)
Detection targets:
- Hardware failures (hard disk, network card)
- Security breaches (hacking attempts)
- Unusual system behavior
Real-World Examples
Section titled “Real-World Examples”- Telecommunications: Monitor cell towers for unusual behavior
- Financial services: Detect fraudulent transactions
- Manufacturing: Quality control for anomalous parts
Anomaly detection provides a systematic approach to identifying unusual events by learning what “normal” looks like and flagging significant deviations from typical patterns.