Skip to content
Pablo Rodriguez

Finding Unusual Events

Anomaly detection algorithms look at an unlabeled dataset of normal events and learn to detect or raise a red flag if there is an unusual or anomalous event.

  • After aircraft engine rolls off assembly line, compute features:
    • x₁ = heat generated by engine
    • x₂ = vibration intensity
    • Additional features as needed
  • Aircraft engine manufacturers don’t make many bad engines
  • Easier to collect data from m normal engines (most are fine)
  • Collect features x₁ and x₂ about how these m engines behave
  • Most examples are normal engines rather than defective ones

Given:

  • m examples of normal engine behavior
  • New engine with feature vector X_test

Goal: Determine if new engine looks similar to previously manufactured ones or if there’s something suspicious that requires closer inspection.

  • Plot training examples x₁ through x_m as crosses (each point = specific engine with specific heat/vibration)
  • New engine appears near training data: Probably okay, looks similar to other engines
  • New engine appears far from training data: Likely anomaly, inspect more carefully before installation

Most common method uses density estimation:

  1. Build probability model: Learn p(x) from training set

    • Determine which feature values have high probability
    • Identify which values have lower probability
  2. Probability regions:

    • Inner ellipse: High probability region
    • Middle ellipse: Medium probability
    • Outer ellipse: Lower probability
    • Outside ellipses: Very low probability
  3. Classification rule:

    • Compute p(X_test) for new example
    • If p(X_test) < ε (small threshold): Flag as anomaly
    • If p(X_test) ≥ ε: Classify as normal

Features for user behavior:

  • Login frequency
  • Number of web pages visited
  • Number of transactions made
  • Discussion forum posts
  • Typing speed (characters per second)

Process:

  • Model p(x) from data to understand typical user behavior
  • Don’t automatically disable suspicious accounts
  • Instead: request additional security verification
    • Cell phone verification
    • CAPTCHA challenges
    • Identity confirmation

Use cases:

  • Fake account detection
  • Financial fraud identification (unusual purchase patterns)

Wide industry usage:

  • Aircraft engines
  • Printed circuit boards
  • Smartphones
  • Motors
  • Many other manufactured items

Purpose: Detect units that behave strangely, indicating potential defects before shipping to customers.

Features for machine monitoring:

  • Memory usage
  • Disk accesses per second
  • CPU load
  • Ratio features (e.g., CPU load to network traffic)

Detection targets:

  • Hardware failures (hard disk, network card)
  • Security breaches (hacking attempts)
  • Unusual system behavior
  • Telecommunications: Monitor cell towers for unusual behavior
  • Financial services: Detect fraudulent transactions
  • Manufacturing: Quality control for anomalous parts

Anomaly detection provides a systematic approach to identifying unusual events by learning what “normal” looks like and flagging significant deviations from typical patterns.