Choosing Features

Choosing What Features to Use

Importance of Feature Selection

Why Features Matter More in Anomaly Detection

Supervised learning:

Algorithm can figure out irrelevant features to ignore
Supervised signal (labels) helps guide feature usage
Extra irrelevant features often okay

Anomaly detection:

Learns from unlabeled data only
Harder for algorithm to determine feature relevance
Careful feature selection more critical

Making Features More Gaussian

Why Gaussian Features Help

Anomaly detection models assume features follow Gaussian distributions. Non-Gaussian features can hurt performance.

Checking Feature Distribution

Step 1: Plot histogram of feature using:

plt.hist(X, bins=50)

Good feature: Distribution looks approximately Gaussian (bell-shaped)

Problematic feature: Highly skewed or non-symmetric distribution

Feature Transformations

When features aren’t Gaussian, apply transformations:

Log Transformation

X_new = np.log(X)

Use when: Original feature is right-skewed

Log with Offset

X_new = np.log(X + c)

Parameters:

Larger c = less transformation
Try different values of c to find best fit

Power Transformations

X_new = X ** (1/2)    # Square root
X_new = X ** (1/3)    # Cube root
X_new = X ** 0.4      # Custom power

Interactive Feature Exploration

Process:

Plot histogram with 50 bins
Try different transformations
Adjust parameters interactively
Pick transformation that looks most Gaussian

Example exploration:

plt.hist(X, bins=50, color='blue')
plt.hist(X**0.5, bins=50)     # Try square root
plt.hist(X**0.4, bins=50)     # Try different power
plt.hist(np.log(X + 0.001), bins=50)  # Try log transform

Practical Guidelines

Quick evaluation: Visual inspection usually sufficient
Automated measures exist but visual assessment works well in practice
Apply same transformation to training, cross-validation, and test sets

Error Analysis for Anomaly Detection

Common Problem: Similar Probabilities

Issue: p(x) is comparably large for both normal and anomalous examples

Example scenario:

Anomalous example appears “normal” according to existing features
High probability despite being truly anomalous
Algorithm fails to flag genuine anomaly

Solution: Add Discriminative Features

Process:

Identify failure cases: Look at anomalies missed by algorithm
Analyze what makes them anomalous: What distinguishes them from normal examples?
Create new features: Add features that capture these differences
Test improvement: Check if new features help detection

Concrete Example: Fraud Detection

Original feature: x₁ = number of transactions Problem: Fraudulent user makes similar number of transactions as normal users

Solution: Add x₂ = typing speed Result: Fraudulent user has unusually fast typing speed, making them easier to detect

Outcome: New 2D feature space separates anomaly from normal examples

Feature Engineering Strategies

Combining Existing Features

Create ratio features:

X5 = CPU_load / network_traffic    # Computer monitoring
X6 = (CPU_load)**2 / network_traffic  # Alternative combination

Example use case: Data center monitoring

Normal patterns: High CPU + high network traffic, OR low CPU + low network traffic
Anomalous pattern: High CPU + low network traffic
New feature: CPU/network ratio captures this anomaly

Domain-Specific Features

Computer monitoring features:

x₁ = memory usage
x₂ = disk accesses per second
x₃ = CPU load
x₄ = network traffic volume
x₅ = CPU load / network traffic (ratio)

Iterative Development Process

Train model with current features
Identify missed anomalies in cross-validation set
Analyze failure cases: What makes them anomalous?
Engineer new features that capture these patterns
Retrain and evaluate: Check if p(x) becomes small for anomalies
Repeat until satisfactory performance

Goal of Feature Engineering

Objective: Create features where:

p(x) remains large for normal examples
p(x) becomes small for anomalous examples

Success criteria: Clear separation between normal and anomalous examples in the feature space

Effective feature engineering is often the key to successful anomaly detection, requiring careful analysis of what makes anomalous examples truly different from normal ones.