Bias Variance Quiz

Bias and Variance Quiz

Question 1

If the model’s cross validation error J_cv is much higher than the training error J_train, this is an indication that the model has…

high bias
Low variance
high variance ✓
Low bias

Answer Location: When J_cv >> J_train, this indicates high variance (overfitting). Found in Section 1: “the gap between training error and cross-validation error, and if this is high then you will conclude you have a high variance problem.”

Question 2

Which of these is the best way to determine whether your model has high bias (has underfit the training data)?

Compare the training error to the baseline level of performance ✓
See if the training error is high (above 15% or so)
See if the cross validation error is high compared to the baseline level of performance
Compare the training error to the cross validation error

Answer Location: Found in Section 2: “Rather than just asking is my training error a lot, you can ask is my training error large relative to what I hope I can get to eventually, such as, is my training large relative to what humans can do on the task?”

Question 3

You find that your algorithm has high bias. Which of these seem like good options for improving the algorithm’s performance? (Select two correct answers)

☐ Remove examples from the training set
☑ Collect additional features or add polynomial features ✓
☑ Decrease the regularization parameter λ (lambda) ✓
☐ Collect more training examples

Answer Location: Found in Section 4: “if your algorithm has high bias, the main fixes are to make your model more powerful or to give them more flexibility to fit more complex or more wiggly functions. Some ways to do that are to give it additional features or add these polynomial features, or to decrease the regularization parameter Lambda.”

Question 4

You find that your algorithm has a training error of 2%, and a cross validation error of 20% (much higher than the training error). Based on the conclusion you would draw about whether the algorithm has a high bias or high variance problem, which of these seem like good options for improving the algorithm’s performance? (Select two correct answers)

☑ Increase the regularization parameter λ ✓
☐ Reduce the training set size
☑ Collect more training data ✓
☐ Decrease the regularization parameter λ

Answer Location: The large gap (18%) between CV error and training error indicates high variance. From Section 4: “if you find that your algorithm has high variance, then the two main ways to fix that are; neither get more training data or simplify your model… either get a smaller set of features or increase the regularization parameter Lambda.”