Deep Learning

Deep Learning for Content-Based Filtering

Neural Network Architecture Approach

A good way to develop a content-based filtering algorithm is to use deep learning. The approach you see in this video is the way that many important commercial state-of-the-art content-based filtering algorithms are built today.

Two-Network System

User Network

The first neural network is the user network that takes as input the list of features of the user (x_u) such as age, gender, country of the user, and so on. Using a few layers of dense neural network layers, it outputs the vector v_u that describes the user.

In the example network:

Input: User features (age, gender, country, etc.)
Hidden layers: Dense neural network layers
Output layer: 32 units producing v_u as a list of 32 numbers

Movie Network

Similarly, to compute v_m for a movie, we have a movie network that takes as input features of the movie and through a few layers of a neural network outputs v_m, the vector that describes the movie.

Input: Movie features (year, genre, stars, etc.)
Hidden layers: Dense neural network layers
Output layer: 32 units producing v_m as a list of 32 numbers

Architecture Flexibility

The user network and the movie network can hypothetically have different numbers of hidden layers and different numbers of units per hidden layer. All that matters is the output layer needs to have the same size dimension.

Final Prediction

The rating prediction is computed as: v_u · v_m (dot product)

Unified Network Diagram

Rather than drawing two separate networks, we can represent this as a single neural network:

Upper portion: User network (x_u → v_u)
Lower portion: Movie network (x_m → v_m)
Dot product layer: Combines v_u and v_m to produce prediction

Binary Label Extension

For binary labels (like/dislike rather than star ratings):

Apply sigmoid function to v_u · v_m
Predict probability that y^(i,j) = 1
Use this to predict the probability that user j likes item i

Training the Combined Model

Cost Function

The cost function is very similar to collaborative filtering:

J = Σ[(i,j): r(i,j)=1] (v_u^(j) · v_m^(i) - y^(i,j))²

Where we sum over all pairs i and j where we have labels (r(i,j) = 1).

Training Process

There’s no separate training procedure for the user and movie networks
This cost function is used to train all parameters of both networks simultaneously
The networks are judged according to how well v_u and v_m predict y^(i,j)
Use gradient descent or other optimization algorithms to minimize the cost function

Regularization

For regularization, add the usual neural network regularization term to encourage the networks to keep the values of their parameters small.

Finding Similar Items

After training, the model can find similar items using the learned feature vectors.

Similarity Measurement

v_m^(i): 32-dimensional vector describing movie i
To find movies similar to movie i, look for other movies k where the distance between vectors is small
Use squared distance: ||v_m^(k) - v_m^(i)||²

This is similar to collaborative filtering’s approach of finding movies with similar feature vectors.

Pre-computation Advantage

Neural Network Architecture Benefits

This approach demonstrates one of the benefits of neural networks mentioned when discussing decision trees versus neural networks: it’s easier to take multiple neural networks and put them together to make them work in concert to build a larger system.

The content-based filtering system shows this by:

Taking a user network and movie network
Putting them together
Taking the inner product of their outputs
Creating a more complex architecture that’s quite powerful

Feature Engineering Importance

In practice, developers often end up spending significant time carefully designing the features needed to feed into these content-based filtering algorithms. If building these systems commercially, it may be worth spending time engineering good features for the application.

Computational Limitations

One limitation of the algorithm as described is that it can be computationally very expensive to run if you have a large catalog with many different movies you may want to recommend. This challenge leads to the need for scaling solutions addressed in subsequent approaches.

The deep learning approach to content-based filtering provides a flexible, powerful framework that can incorporate rich user and item features to make sophisticated recommendations while maintaining the ability to find similar items and scale to commercial applications.