Binary Labels

Binary Labels: Favs, Likes and Clicks

From Ratings to Binary Engagement

“Many important applications of recommender systems or collective filtering algorithms involved binary labels where instead of a user giving you a one to five star or zero to five star rating, they just somehow give you a sense of they like this item or they did not like this item.”

Binary Label Examples

Dataset Structure

Binary collaborative filtering uses:

1: “The user liked or engaged with a particular movie”
0: User did not engage after being shown the item
?: “The user has not yet seen the item”

Practical Interpretations

Label 1 could mean:

“Alice watched the movie Love at last all the way to the end”
“She explicitly hit like or favorite on an app”
User spent sufficient time with content

Label 0 could mean:

“After playing a few minutes of nonstop car chases decided to stop the video”
“Did not hit like” after being exposed to the item

Common Binary Label Definitions

E-commerce Applications

Purchase behavior: “Whether or not user j chose to purchase an item after they were exposed to it”
- 1 = purchased after exposure
- 0 = did not purchase after exposure
- ? = not shown the item

Engagement actions: “Did the user favorite or like an item after they were shown it”
Time-based engagement: “If a user spends at least 30 seconds of an item” assign label 1
Click behavior: “See that the user click on an item” (common in online advertising)

General Pattern

Label 1: “The user engaged after being shown an item” (clicked, spent time, favorited, purchased) Label 0: “The user not engaging after being shown the item” Question Mark: “The item not yet having been shown to the user”

Algorithm Adaptation: Linear to Logistic

Previous Approach (Continuous Ratings)

prediction = w^(j) · X^(i) + b^(j)

New Approach (Binary Labels)

P(y^(i,j) = 1) = g(w^(j) · X^(i) + b^(j))
where g(z) = 1/(1 + e^(-z))

“This is the logistic function just like we saw in logistic regression.”

The model now predicts “the probability of y^(i,j) being 1 that is of the user having engaged with or like the item.”

Cost Function Modification

From Squared Error to Cross-Entropy

Previous cost function used squared error for continuous ratings.

New cost function uses “binary cross entropy cost function”:

Loss = -y*log(f) - (1-y)*log(1-f)

Complete Binary Collaborative Filtering Cost

J(w,b,X) = Σ[(i,j): r(i,j)=1] Loss(f^(i,j), y^(i,j)) + regularization

Where:

f^(i,j) = g(w^(j) · X^(i) + b^(j))
Loss uses the binary cross-entropy function
Regularization terms remain similar to previous approach

Implementation Advantages

Binary labels are often easier to collect than explicit ratings:

Users naturally engage/don’t engage with content
Implicit feedback from user behavior
No need to ask users for explicit ratings
Can work with click-through data, time spent, purchases, etc.

The transition from linear regression-like collaborative filtering to logistic regression-like collaborative filtering follows the same mathematical principles but opens up recommender systems to a much broader range of practical applications where explicit ratings aren’t available.