Skip to content
Pablo Rodriguez

Bias Variance Neural Networks

Before neural networks, machine learning required balancing model complexity:

  • Simple models: High bias (underfit)
  • Complex models: High variance (overfit)
  • Tradeoff required: Find optimal complexity level
Game Changer

Large neural networks on small-to-moderate datasets are low bias machines - they can almost always fit training sets well if made large enough.

  1. Train on training set - measure J_train

  2. Check training performance - does it do well relative to baseline?

    • If NO → High bias → Use bigger network
    • If YES → Continue to step 3
  3. Check cross-validation performance - does it do well on CV set?

    • If NO → High variance → Get more data
    • If YES → Done!
  • Keep making network bigger until J_train reaches acceptable level
  • Then collect more data until J_cv approaches J_train
  • Continue until both perform well

Escapes traditional tradeoff:

  • Address bias by increasing network size
  • Address variance by adding more data
  • No need to sacrifice one for the other

Systematic approach:

  • Clear diagnostic criteria
  • Specific remedies for each problem
  • Measurable progress indicators
  • Bigger networks: More expensive to train
  • Hardware requirements: GPUs essential for large models
  • Training time: Can become infeasible beyond certain point
  • More data: Sometimes hard to obtain
  • Collection costs: Can be prohibitive
  • Quality concerns: Need representative samples

Standard neural network cost function:

  • Average loss (squared error or logistic loss)
  • Plus regularization term: λ/(2m) × sum of w²
regularized_network.py
# TensorFlow regularization example
model = tf.keras.Sequential([
tf.keras.layers.Dense(25, activation='sigmoid',
kernel_regularizer=tf.keras.regularizers.l2(0.01)),
tf.keras.layers.Dense(15, activation='sigmoid',
kernel_regularizer=tf.keras.regularizers.l2(0.01)),
tf.keras.layers.Dense(1, activation='sigmoid',
kernel_regularizer=tf.keras.regularizers.l2(0.01))
])
  • Fight variance more often than bias (if network is large enough)
  • Regularize appropriately to maintain performance
  • Scale up systematically rather than randomly
  • Computational budget: Plan for larger networks
  • Data collection: Prioritize when variance is limiting factor
  • Time investment: Bigger networks = longer training

Neural networks fundamentally changed how we think about bias-variance tradeoffs, providing a systematic path to better performance without traditional compromises.