Choosing Features
Choosing What Features to Use
Section titled “Choosing What Features to Use”Importance of Feature Selection
Section titled “Importance of Feature Selection”Why Features Matter More in Anomaly Detection
Section titled “Why Features Matter More in Anomaly Detection”Supervised learning:
- Algorithm can figure out irrelevant features to ignore
- Supervised signal (labels) helps guide feature usage
- Extra irrelevant features often okay
Anomaly detection:
- Learns from unlabeled data only
- Harder for algorithm to determine feature relevance
- Careful feature selection more critical
Making Features More Gaussian
Section titled “Making Features More Gaussian”Why Gaussian Features Help
Section titled “Why Gaussian Features Help”Anomaly detection models assume features follow Gaussian distributions. Non-Gaussian features can hurt performance.
Checking Feature Distribution
Section titled “Checking Feature Distribution”Step 1: Plot histogram of feature using:
plt.hist(X, bins=50)
Good feature: Distribution looks approximately Gaussian (bell-shaped)
Problematic feature: Highly skewed or non-symmetric distribution
Feature Transformations
Section titled “Feature Transformations”When features aren’t Gaussian, apply transformations:
Log Transformation
Section titled “Log Transformation”X_new = np.log(X)
Use when: Original feature is right-skewed
Log with Offset
Section titled “Log with Offset”X_new = np.log(X + c)
Parameters:
- Larger c = less transformation
- Try different values of c to find best fit
Power Transformations
Section titled “Power Transformations”X_new = X ** (1/2) # Square rootX_new = X ** (1/3) # Cube rootX_new = X ** 0.4 # Custom power
Interactive Feature Exploration
Section titled “Interactive Feature Exploration”Process:
- Plot histogram with 50 bins
- Try different transformations
- Adjust parameters interactively
- Pick transformation that looks most Gaussian
Example exploration:
plt.hist(X, bins=50, color='blue')plt.hist(X**0.5, bins=50) # Try square rootplt.hist(X**0.4, bins=50) # Try different powerplt.hist(np.log(X + 0.001), bins=50) # Try log transform
Practical Guidelines
Section titled “Practical Guidelines”- Quick evaluation: Visual inspection usually sufficient
- Automated measures exist but visual assessment works well in practice
- Apply same transformation to training, cross-validation, and test sets
Error Analysis for Anomaly Detection
Section titled “Error Analysis for Anomaly Detection”Common Problem: Similar Probabilities
Section titled “Common Problem: Similar Probabilities”Issue: p(x) is comparably large for both normal and anomalous examples
Example scenario:
- Anomalous example appears “normal” according to existing features
- High probability despite being truly anomalous
- Algorithm fails to flag genuine anomaly
Solution: Add Discriminative Features
Section titled “Solution: Add Discriminative Features”Process:
- Identify failure cases: Look at anomalies missed by algorithm
- Analyze what makes them anomalous: What distinguishes them from normal examples?
- Create new features: Add features that capture these differences
- Test improvement: Check if new features help detection
Concrete Example: Fraud Detection
Section titled “Concrete Example: Fraud Detection”Original feature: x₁ = number of transactions Problem: Fraudulent user makes similar number of transactions as normal users
Solution: Add x₂ = typing speed Result: Fraudulent user has unusually fast typing speed, making them easier to detect
Outcome: New 2D feature space separates anomaly from normal examples
Feature Engineering Strategies
Section titled “Feature Engineering Strategies”Combining Existing Features
Section titled “Combining Existing Features”Create ratio features:
X5 = CPU_load / network_traffic # Computer monitoringX6 = (CPU_load)**2 / network_traffic # Alternative combination
Example use case: Data center monitoring
- Normal patterns: High CPU + high network traffic, OR low CPU + low network traffic
- Anomalous pattern: High CPU + low network traffic
- New feature: CPU/network ratio captures this anomaly
Domain-Specific Features
Section titled “Domain-Specific Features”Computer monitoring features:
- x₁ = memory usage
- x₂ = disk accesses per second
- x₃ = CPU load
- x₄ = network traffic volume
- x₅ = CPU load / network traffic (ratio)
Iterative Development Process
Section titled “Iterative Development Process”- Train model with current features
- Identify missed anomalies in cross-validation set
- Analyze failure cases: What makes them anomalous?
- Engineer new features that capture these patterns
- Retrain and evaluate: Check if p(x) becomes small for anomalies
- Repeat until satisfactory performance
Goal of Feature Engineering
Section titled “Goal of Feature Engineering”Objective: Create features where:
- p(x) remains large for normal examples
- p(x) becomes small for anomalous examples
Success criteria: Clear separation between normal and anomalous examples in the feature space
Effective feature engineering is often the key to successful anomaly detection, requiring careful analysis of what makes anomalous examples truly different from normal ones.