Skip to content
Pablo Rodriguez

Choosing Features

Why Features Matter More in Anomaly Detection

Section titled “Why Features Matter More in Anomaly Detection”

Supervised learning:

  • Algorithm can figure out irrelevant features to ignore
  • Supervised signal (labels) helps guide feature usage
  • Extra irrelevant features often okay

Anomaly detection:

  • Learns from unlabeled data only
  • Harder for algorithm to determine feature relevance
  • Careful feature selection more critical

Anomaly detection models assume features follow Gaussian distributions. Non-Gaussian features can hurt performance.

Step 1: Plot histogram of feature using:

histogram-plot
plt.hist(X, bins=50)

Good feature: Distribution looks approximately Gaussian (bell-shaped)

Problematic feature: Highly skewed or non-symmetric distribution

When features aren’t Gaussian, apply transformations:

log-transform
X_new = np.log(X)

Use when: Original feature is right-skewed

log-offset-transform
X_new = np.log(X + c)

Parameters:

  • Larger c = less transformation
  • Try different values of c to find best fit
power-transforms
X_new = X ** (1/2) # Square root
X_new = X ** (1/3) # Cube root
X_new = X ** 0.4 # Custom power

Process:

  1. Plot histogram with 50 bins
  2. Try different transformations
  3. Adjust parameters interactively
  4. Pick transformation that looks most Gaussian

Example exploration:

feature-exploration
plt.hist(X, bins=50, color='blue')
plt.hist(X**0.5, bins=50) # Try square root
plt.hist(X**0.4, bins=50) # Try different power
plt.hist(np.log(X + 0.001), bins=50) # Try log transform
  • Quick evaluation: Visual inspection usually sufficient
  • Automated measures exist but visual assessment works well in practice
  • Apply same transformation to training, cross-validation, and test sets

Issue: p(x) is comparably large for both normal and anomalous examples

Example scenario:

  • Anomalous example appears “normal” according to existing features
  • High probability despite being truly anomalous
  • Algorithm fails to flag genuine anomaly

Process:

  1. Identify failure cases: Look at anomalies missed by algorithm
  2. Analyze what makes them anomalous: What distinguishes them from normal examples?
  3. Create new features: Add features that capture these differences
  4. Test improvement: Check if new features help detection

Original feature: x₁ = number of transactions Problem: Fraudulent user makes similar number of transactions as normal users

Solution: Add x₂ = typing speed Result: Fraudulent user has unusually fast typing speed, making them easier to detect

Outcome: New 2D feature space separates anomaly from normal examples

Create ratio features:

feature-combinations
X5 = CPU_load / network_traffic # Computer monitoring
X6 = (CPU_load)**2 / network_traffic # Alternative combination

Example use case: Data center monitoring

  • Normal patterns: High CPU + high network traffic, OR low CPU + low network traffic
  • Anomalous pattern: High CPU + low network traffic
  • New feature: CPU/network ratio captures this anomaly

Computer monitoring features:

  • x₁ = memory usage
  • x₂ = disk accesses per second
  • x₃ = CPU load
  • x₄ = network traffic volume
  • x₅ = CPU load / network traffic (ratio)
  1. Train model with current features
  2. Identify missed anomalies in cross-validation set
  3. Analyze failure cases: What makes them anomalous?
  4. Engineer new features that capture these patterns
  5. Retrain and evaluate: Check if p(x) becomes small for anomalies
  6. Repeat until satisfactory performance

Objective: Create features where:

  • p(x) remains large for normal examples
  • p(x) becomes small for anomalous examples

Success criteria: Clear separation between normal and anomalous examples in the feature space

Effective feature engineering is often the key to successful anomaly detection, requiring careful analysis of what makes anomalous examples truly different from normal ones.