Bias Variance Neural Networks

Bias/Variance and Neural Networks

Traditional Bias-Variance Tradeoff

Before neural networks, machine learning required balancing model complexity:

Simple models: High bias (underfit)
Complex models: High variance (overfit)
Tradeoff required: Find optimal complexity level

Neural Networks Change the Game

Game Changer

Large neural networks on small-to-moderate datasets are low bias machines - they can almost always fit training sets well if made large enough.

New Recipe for Neural Networks

Step-by-Step Process

Train on training set - measure J_train
Check training performance - does it do well relative to baseline?
- If NO → High bias → Use bigger network
- If YES → Continue to step 3
Check cross-validation performance - does it do well on CV set?
- If NO → High variance → Get more data
- If YES → Done!

Iterative Loop

Keep making network bigger until J_train reaches acceptable level
Then collect more data until J_cv approaches J_train
Continue until both perform well

Key Advantages

Escapes traditional tradeoff:

Address bias by increasing network size
Address variance by adding more data
No need to sacrifice one for the other

Systematic approach:

Clear diagnostic criteria
Specific remedies for each problem
Measurable progress indicators

Limitations and Considerations

Computational Constraints

Bigger networks: More expensive to train
Hardware requirements: GPUs essential for large models
Training time: Can become infeasible beyond certain point

Data Limitations

More data: Sometimes hard to obtain
Collection costs: Can be prohibitive
Quality concerns: Need representative samples

Large Neural Networks and Regularization

Regularization Implementation

Standard neural network cost function:

Average loss (squared error or logistic loss)
Plus regularization term: λ/(2m) × sum of w²

# TensorFlow regularization example
model = tf.keras.Sequential([
  tf.keras.layers.Dense(25, activation='sigmoid',
                       kernel_regularizer=tf.keras.regularizers.l2(0.01)),
  tf.keras.layers.Dense(15, activation='sigmoid',
                       kernel_regularizer=tf.keras.regularizers.l2(0.01)),
  tf.keras.layers.Dense(1, activation='sigmoid',
                       kernel_regularizer=tf.keras.regularizers.l2(0.01))
])

Practical Implications

When Training Neural Networks

Fight variance more often than bias (if network is large enough)
Regularize appropriately to maintain performance
Scale up systematically rather than randomly

Resource Allocation

Computational budget: Plan for larger networks
Data collection: Prioritize when variance is limiting factor
Time investment: Bigger networks = longer training

Neural networks fundamentally changed how we think about bias-variance tradeoffs, providing a systematic path to better performance without traditional compromises.