Skip to content
Pablo Rodriguez

Binary Labels

“Many important applications of recommender systems or collective filtering algorithms involved binary labels where instead of a user giving you a one to five star or zero to five star rating, they just somehow give you a sense of they like this item or they did not like this item.”

Binary collaborative filtering uses:

  • 1: “The user liked or engaged with a particular movie”
  • 0: User did not engage after being shown the item
  • ?: “The user has not yet seen the item”

Label 1 could mean:

  • “Alice watched the movie Love at last all the way to the end”
  • “She explicitly hit like or favorite on an app”
  • User spent sufficient time with content

Label 0 could mean:

  • “After playing a few minutes of nonstop car chases decided to stop the video”
  • “Did not hit like” after being exposed to the item
  • Purchase behavior: “Whether or not user j chose to purchase an item after they were exposed to it”
    • 1 = purchased after exposure
    • 0 = did not purchase after exposure
    • ? = not shown the item
  • Engagement actions: “Did the user favorite or like an item after they were shown it”
  • Time-based engagement: “If a user spends at least 30 seconds of an item” assign label 1
  • Click behavior: “See that the user click on an item” (common in online advertising)

General Pattern

Label 1: “The user engaged after being shown an item” (clicked, spent time, favorited, purchased) Label 0: “The user not engaging after being shown the item” Question Mark: “The item not yet having been shown to the user”

prediction = w^(j) · X^(i) + b^(j)
P(y^(i,j) = 1) = g(w^(j) · X^(i) + b^(j))
where g(z) = 1/(1 + e^(-z))

“This is the logistic function just like we saw in logistic regression.”

The model now predicts “the probability of y^(i,j) being 1 that is of the user having engaged with or like the item.”

Previous cost function used squared error for continuous ratings.

New cost function uses “binary cross entropy cost function”:

Loss = -y*log(f) - (1-y)*log(1-f)

Complete Binary Collaborative Filtering Cost

Section titled “Complete Binary Collaborative Filtering Cost”
J(w,b,X) = Σ[(i,j): r(i,j)=1] Loss(f^(i,j), y^(i,j)) + regularization

Where:

  • f^(i,j) = g(w^(j) · X^(i) + b^(j))
  • Loss uses the binary cross-entropy function
  • Regularization terms remain similar to previous approach

Binary labels are often easier to collect than explicit ratings:

  • Users naturally engage/don’t engage with content
  • Implicit feedback from user behavior
  • No need to ask users for explicit ratings
  • Can work with click-through data, time spent, purchases, etc.

The transition from linear regression-like collaborative filtering to logistic regression-like collaborative filtering follows the same mathematical principles but opens up recommender systems to a much broader range of practical applications where explicit ratings aren’t available.