General Pattern
Label 1: “The user engaged after being shown an item” (clicked, spent time, favorited, purchased) Label 0: “The user not engaging after being shown the item” Question Mark: “The item not yet having been shown to the user”
“Many important applications of recommender systems or collective filtering algorithms involved binary labels where instead of a user giving you a one to five star or zero to five star rating, they just somehow give you a sense of they like this item or they did not like this item.”
Binary collaborative filtering uses:
Label 1 could mean:
Label 0 could mean:
General Pattern
Label 1: “The user engaged after being shown an item” (clicked, spent time, favorited, purchased) Label 0: “The user not engaging after being shown the item” Question Mark: “The item not yet having been shown to the user”
prediction = w^(j) · X^(i) + b^(j)
P(y^(i,j) = 1) = g(w^(j) · X^(i) + b^(j))where g(z) = 1/(1 + e^(-z))
“This is the logistic function just like we saw in logistic regression.”
The model now predicts “the probability of y^(i,j) being 1 that is of the user having engaged with or like the item.”
Previous cost function used squared error for continuous ratings.
New cost function uses “binary cross entropy cost function”:
Loss = -y*log(f) - (1-y)*log(1-f)
J(w,b,X) = Σ[(i,j): r(i,j)=1] Loss(f^(i,j), y^(i,j)) + regularization
Where:
Binary labels are often easier to collect than explicit ratings:
The transition from linear regression-like collaborative filtering to logistic regression-like collaborative filtering follows the same mathematical principles but opens up recommender systems to a much broader range of practical applications where explicit ratings aren’t available.