Improvement
“This seems more reasonable to think Eve is likely to rate this movie 2.5 rather than think Eve will rate all movie zero stars just because she hasn’t rated any movies yet.”
“Back in the first course, you have seen how for linear regression, future normalization can help the algorithm run faster.” For recommender systems with “numbers wide such as movie ratings from one to five or zero to five stars, it turns out your algorithm will run more efficiently. And also perform a bit better if you first carry out mean normalization.”
Mean normalization involves “normalize the movie ratings to have a consistent average value.”
To illustrate the issue, adding “a fifth user Eve who has not yet rated any movies” demonstrates the problem.
For a user with no ratings:
With zero parameters, the algorithm predicts:
rating = w^(5) · x^(i) + b^(5) = 0
For each movie i, compute μᵢ - “the average rating that was given”:
“Averaging over just the users that did read that particular movie.”
Transform original ratings Y(i,j) by subtracting movie means:
Y_norm(i,j) = Y(i,j) - μᵢ
Example transformations:
When making predictions, add back the mean:
prediction = w^(j) · x^(i) + b^(j) + μᵢ
For the new user with w^(5) = [0, 0] and b^(5) = 0:
prediction for movie 1 = w^(5) · x^(1) + b^(5) + μ₁ = 0 + 2.5 = 2.5
Improvement
“This seems more reasonable to think Eve is likely to rate this movie 2.5 rather than think Eve will rate all movie zero stars just because she hasn’t rated any movies yet.”
“The effect of this algorithm is it will cause the initial guesses for the new user Eve to be just equal to the mean of whatever other users have rated these five movies.”
“By normalizing the mean of the different movies ratings to be zero, the optimization algorithm for the recommender system will also run just a little bit faster.”
“It does make the algorithm behave much better for users who have rated no movies or very small numbers of movies. And the predictions will become more reasonable.”
“Normalizing the rows so that you can give reasonable ratings for a new user seems more important than normalizing the columns.”
For new movies: “If there’s a brand new movie that no one has rated yet, you probably shouldn’t show that movie to too many users initially because you don’t know that much about that movie.”
Mean normalization “makes the algorithm run a little bit faster” but “even more important, it makes the algorithm give much better, much more reasonable predictions when there are users that rated very few movies or even no movies at all.”
This implementation detail “will make your recommended system work much better” by providing sensible default predictions based on average movie ratings rather than zero ratings.