Key Distinction
Content-based filtering requires having some features of each user, as well as some features of each item and it uses those features to try to decide which items and users might be a good match for each other.
Collaborative filtering recommends items based on ratings of users who gave similar ratings as you. The system has some number of users give some ratings for some items, and the algorithm figures out how to use that to recommend new items to you.
Content-based filtering takes a different approach to deciding what to recommend to you. A content-based filtering algorithm will recommend items to you based on the features of users and features of the items to find a good match.
Key Distinction
Content-based filtering requires having some features of each user, as well as some features of each item and it uses those features to try to decide which items and users might be a good match for each other.
Both approaches still use the same core rating data:
The key difference is that content-based filtering can make good use of features of the user and of the items to find better matches than potentially a pure collaborative filtering approach might be able to.
Content-based systems can look at past behaviors of the user to construct feature vectors:
If you look at the top thousand movies in your catalog, you might construct a thousand features that tells you of the thousand most popular movies in the world which of these has the user watched.
You can take ratings the user might have already given in order to construct new features:
These features combine to create a user feature vector: x_u^(j) for user j.
These create a movie feature vector: x_m^(i) for movie i.
User features and movie features can be very different in size:
Previously in collaborative filtering: w^(j) · x^(i) + b^(j)
In content-based filtering, we eliminate b^(j) and replace the notation:
The prediction becomes: v_u^(j) · v_m^(i)
If the user vector v_u captures user preferences as [4.9, 0.1, …]:
And the movie vector v_m is [4.5, 0.2, …]:
Then the dot product hopefully gives a sense of how much this particular user will like this particular movie.
Collaborative Filtering: Number of users give ratings of different items, algorithm learns patterns from rating similarities.
Content-Based Filtering: Features of users and features of items are used to find good matches between users and items by computing vectors v_u for users and v_m for items, then taking dot products to find good matches.
The challenge is learning how to compute v_u and v_m from the available feature information.