Collaborative Filtering Programming

Programming Assignment: Collaborative Filtering Recommender Systems

Lab Overview

Objective: Implement collaborative filtering to build a recommender system for movies using real MovieLens dataset.

Dataset Details

Source: MovieLens “ml-latest-small” dataset
Original size: 9000 movies rated by 600 users
Reduced dataset: nu = 443 users, nm = 4778 movies
Rating scale: 0.5 to 5 in 0.5 step increments
Focus: Movies from years since 2000

Key Notation Reference

Symbol	Description	Python Variable
r(i,j)	1 if user j rated movie i, 0 otherwise	R
y(i,j)	Rating given by user j on movie i	Y
w^(j)	Parameter vector for user j	W
b^(j)	Bias parameter for user j	b
x^(i)	Feature vector for movie i	X
nu	Number of users	num_users
nm	Number of movies	num_movies
n	Number of features	num_features

Core Algorithm Concept

The collaborative filtering goal is to learn:

User parameters: w^(user) and bias for each user
Movie features: x^(movie) for each movie
Prediction: w^(j) · x^(i) + b^(j) = predicted rating

Exercise 1: Implement Cost Function

REQUIRED CODE: cofi_cost_func Implementation

You need to implement the collaborative filtering cost function:

def cofi_cost_func(X, W, b, Y, R, lambda_):
  """
  Returns the cost for the content-based filtering
  Args:
    X (ndarray (num_movies,num_features)): matrix of item features
    W (ndarray (num_users,num_features)) : matrix of user parameters
    b (ndarray (1, num_users)            : vector of user parameters
    Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
    R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if rated
    lambda_ (float): regularization parameter
  Returns:
    J (float) : Cost
  """
  nm, nu = Y.shape
  J = 0
  ### START CODE HERE ###
  for j in range(nu):
      w = W[j,:]
      b_j = b[0,j]
      for i in range(nm):
          x = X[i,:]
          y = Y[i,j]
          r = R[i,j]
          J += np.square(r * (np.dot(w,x) + b_j - y))
  J = J/2

  # Add regularization
  J += (lambda_/2) * (np.sum(np.square(W)) + np.sum(np.square(X)))
  ### END CODE HERE ###
  return J

Cost Function Formula

J = (1/2) * Σ[(i,j): r(i,j)=1] (w^(j)·x^(i) + b^(j) - y^(i,j))²
  + (λ/2) * Σ[w parameters]² + (λ/2) * Σ[x features]²

Expected Results

Without regularization (λ=0): Cost = 13.67
With regularization (λ=1.5): Cost = 28.09

Vectorized Implementation

A vectorized version is provided for efficiency:

def cofi_cost_func_v(X, W, b, Y, R, lambda_):
  j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
  J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
  return J

Training Process

Parameter Initialization

# Set parameters
num_features = 100
tf.random.set_seed(1234)

# Initialize variables
W = tf.Variable(tf.random.normal((num_users, num_features), dtype=tf.float64), name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features), dtype=tf.float64), name='X')
b = tf.Variable(tf.random.normal((1, num_users), dtype=tf.float64), name='b')

# Setup optimizer
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

Custom Training Loop

iterations = 200
lambda_ = 1

for iter in range(iterations):
  with tf.GradientTape() as tape:
      cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)

  grads = tape.gradient(cost_value, [X,W,b])
  optimizer.apply_gradients(zip(grads, [X,W,b]))

  if iter % 20 == 0:
      print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

Personal Movie Ratings Setup

REQUIRED: Rate Movies for Recommendations

You need to set your movie preferences:

my_ratings = np.zeros(num_movies)

# Example ratings (modify these for your preferences)
my_ratings[2700] = 5   # Toy Story 3 (2010)
my_ratings[2609] = 2   # Persuasion (2007)
my_ratings[929]  = 5   # Lord of the Rings: Return of the King
my_ratings[246]  = 5   # Shrek (2001)
my_ratings[2716] = 3   # Inception
my_ratings[1150] = 5   # Incredibles (2004)
# ... add more ratings based on your preferences

Making Predictions

Generate Recommendations

# Make predictions using trained weights
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

# Restore the mean (denormalize)
pm = p + Ymean
my_predictions = pm[:,0]

# Sort and display top recommendations
ix = tf.argsort(my_predictions, direction='DESCENDING')

for i in range(17):
  j = ix[i]
  if j not in my_rated:
      print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')

Key Implementation Requirements

What You Must Implement

Cost Function: Complete the cofi_cost_func() with proper for loops and regularization
Movie Ratings: Set up your personal my_ratings[] array with your preferences
Understanding: Comprehend how the training loop uses TensorFlow’s automatic differentiation

Expected Learning Outcomes

After completing this assignment:

Understand collaborative filtering algorithm implementation
Experience with TensorFlow custom training loops
See practical recommender system in action with real movie data
Learn how mean normalization improves new user predictions