Finding Related Items

“If you come to an online shopping website and you’re looking at a specific item, say maybe a specific book, the website may show you things like, ‘Here are some other books similar to this one’ or if you’re browsing a specific movie, it may say, ‘Here are some other movies similar to this one.’”

The question is: “How do the websites do that?, so that when you’re looking at one item, it gives you other similar or related items to consider.”

Using Learned Features for Similarity

Feature Vector Approach

“As part of the collaborative filtering we’ve discussed, you learned features x^(i) for every item i, for every movie i or other type of item they’re recommending to users.”

Feature Interpretability Challenge

While earlier examples used interpretable features like “how much a movie is a romance movie versus an action movie,” in practice “when you use this algorithm to learn the features x^(i) automatically, looking at the individual features x₁, x₂, x₃, you find them to be quite hard to interpret.”

Collective Feature Meaning

“But nonetheless, these learned features, collectively x₁, x₂, x₃, other many features, and you have collectively these features do convey something about what that movie is like.”

Distance-Based Similarity

Finding Similar Items

“Given features x^(i) of item i, if you want to find other items, say other movies related to movie i, then what you can do is try to find the item k with features x^(k) that is similar to x^(i).”

Squared Distance Formula

The similarity measure uses squared distance between feature vectors:

distance = Σ(l=1 to n) [x_l^(k) - x_l^(i)]²

This can also be written as: ||x^(k) - x^(i)||²

“If you find not just the one movie with the smallest distance between x^(k) and x^(i) but find say, the five or 10 items with the most similar feature vectors, then you end up finding five or 10 related items to the item x^(i).”

Practical Application

“If you’re building a website and want to help users find related products to a specific product they are looking at, this would be a nice way to do so because the features x^(i) give a sense of what item i is about, other items x^(k) with similar features will turn out to be similar to item i.”

Building Block for Advanced Systems

“Later this week, this idea of finding related items will be a small building blocks that we’ll use to get to an even more powerful recommended system as well.”

Limitations of Collaborative Filtering

Cold Start Problem

“One of this weaknesses is that is not very good at the cold start problem.”

New Items: “If there’s a new item in your catalog, say someone’s just published a new movie and hardly anyone has rated that movie yet, how do you rank the new item if very few users have rated it before?”

New Users: “Similarly, for new users that have rated only a few items, how can we make sure we show them something reasonable?”

Side Information Limitation

“The second limitation of collaborative filtering is it doesn’t give you a natural way to use side information or additional information about items or users.”

Movie Side Information Examples

“What is the genre of the movie”
“Who had a movie stars”
“Whether it is a studio”
“What is the budget”

User Side Information Examples

Demographics: “Their age, gender, location”
Preferences: “If they tell you they like certain movies genres but not other movies genres”
Technical indicators:
- “If you know the user’s IP address, that can tell you a lot about a user’s location”
- “If you know whether the user is accessing your site on a mobile or on a desktop”
- “If you know what web browser they’re using”

Surprising Correlations

“All of these are little cues you can get. They can be surprisingly correlated with the preferences of a user.”

Browser Example: “Users that use the Chrome versus Firefox versus Safari versus the Microsoft Edge browser, they actually behave in very different ways. Even knowing the user web browser can give you a hint when you have collected enough data of what this particular user may like.”

Transition to Content-Based Filtering

“Even though collaborative filtering, we have multiple users give you ratings of multiple items, is a very powerful set of algorithms, it also has some limitations.”

The solution: “Content-based filtering algorithms are a state of the art technique used in many commercial applications today” and “can address a lot of these limitations.”

The distance-based similarity approach provides a foundation for understanding how items relate to each other through their learned feature representations, even when those features aren’t directly interpretable.

Finding Related Items

Finding Related Items

Related Items in E-commerce

Using Learned Features for Similarity

Feature Vector Approach

Feature Interpretability Challenge

Collective Feature Meaning

Distance-Based Similarity

Finding Similar Items

Squared Distance Formula

Finding Multiple Related Items

Building Block for Advanced Systems

Limitations of Collaborative Filtering

Cold Start Problem

Side Information Limitation

Movie Side Information Examples

User Side Information Examples

Transition to Content-Based Filtering