Collaborative Vs Content Based

Collaborative vs Content-Based Filtering

Fundamental Approach Differences

Collaborative Filtering Approach

Collaborative filtering recommends items based on ratings of users who gave similar ratings as you. The system has some number of users give some ratings for some items, and the algorithm figures out how to use that to recommend new items to you.

Content-Based Filtering Approach

Content-based filtering takes a different approach to deciding what to recommend to you. A content-based filtering algorithm will recommend items to you based on the features of users and features of the items to find a good match.

Key Distinction

Content-based filtering requires having some features of each user, as well as some features of each item and it uses those features to try to decide which items and users might be a good match for each other.

Common Data Elements

Both approaches still use the same core rating data:

r(i,j): Whether or not user j has rated item i
y(i,j): The rating that user j gave item i (if defined)

The key difference is that content-based filtering can make good use of features of the user and of the items to find better matches than potentially a pure collaborative filtering approach might be able to.

User Features in Content-Based Systems

Demographic Features

Age of the user
Gender: One-hot feature with values based on whether the user’s self-identified gender is male or female or unknown
Country: One-hot feature with about 200 possible values for different countries

Behavioral Features

Content-based systems can look at past behaviors of the user to construct feature vectors:

Movie Watching History

If you look at the top thousand movies in your catalog, you might construct a thousand features that tells you of the thousand most popular movies in the world which of these has the user watched.

Genre-Based Ratings

You can take ratings the user might have already given in order to construct new features:

If you have a set of movies and know what genre each movie is in
Calculate the average rating per genre that the user has given
Of all the romance movies that the user has rated, what was the average rating?
Of all the action movies that the user has rated, what was the average rating?

These features combine to create a user feature vector: x_u^(j) for user j.

Item Features in Content-Based Systems

Movie Feature Examples

Year of the movie
Genre or genres of the movie if known
Critic reviews: Construct one or multiple features to capture something about what the critics are saying about the movie
Average rating: Take user ratings of the movie to construct a feature such as the average rating of this movie

Extended Rating Features

Average rating per country
Average rating per user demographic
Other types of features based on user feedback patterns

These create a movie feature vector: x_m^(i) for movie i.

Feature Vector Flexibility

User features and movie features can be very different in size:

User features could be 1500 numbers
Movie features could be just 50 numbers
This asymmetry is perfectly acceptable

Prediction Mechanism Evolution

From Collaborative to Content-Based

Previously in collaborative filtering: w^(j) · x^(i) + b^(j)

In content-based filtering, we eliminate b^(j) and replace the notation:

w^(j) becomes v_u^(j) (vector computed for user j, where u stands for user)
x^(i) becomes v_m^(i) (vector computed for movie i, where m stands for movie)

Vector Computation Challenge

v_u^(j): List of numbers computed from the features of user j
v_m^(i): List of numbers computed from the features of movie i
Both vectors must be the same size to compute dot product (e.g., both are 32 numbers)

The prediction becomes: v_u^(j) · v_m^(i)

Conceptual Example

If the user vector v_u captures user preferences as [4.9, 0.1, …]:

First number: How much they like romance movies
Second number: How much they like action movies

And the movie vector v_m is [4.5, 0.2, …]:

First number: How much this is a romance movie
Second number: How much this is an action movie

Then the dot product hopefully gives a sense of how much this particular user will like this particular movie.

Summary Comparison

Collaborative Filtering: Number of users give ratings of different items, algorithm learns patterns from rating similarities.

Content-Based Filtering: Features of users and features of items are used to find good matches between users and items by computing vectors v_u for users and v_m for items, then taking dot products to find good matches.

The challenge is learning how to compute v_u and v_m from the available feature information.