Auto Diff Feature
“This is a very powerful feature of TensorFlow called Auto Diff. And some other machine learning packages like pytorch also support Auto Diff.”
“You might be used to thinking of TensorFlow as a tool for building neural networks. And it is. It’s a great tool for building neural networks.” However, “TensorFlow can also be very helpful for building other types of learning algorithms as well. Like the collaborative filtering algorithm.”
“One of the reasons I like using TensorFlow for talks like these is that for many applications in order to implement gradient descent, you need to find the derivatives of the cost function, but TensorFlow can automatically figure out for you what are the derivatives of the cost function.”
“All you have to do is implement the cost function and without needing to know any calculus, without needing to take derivatives yourself, you can get TensorFlow with just a few lines of code to compute that derivative term, that can be used to optimize the cost function.”
Auto Diff Feature
“This is a very powerful feature of TensorFlow called Auto Diff. And some other machine learning packages like pytorch also support Auto Diff.”
Using simplified cost function: J = (wx - 1)²
Traditional gradient descent update:
w := w - α * (∂J/∂w)
# Initialize parameterw = tf.Variable(3.0)x = 1.0y = 1.0alpha = 0.01iterations = 30
# Gradient descent loopfor iter in range(iterations): with tf.GradientTape() as tape: # Compute cost function f = w * x J = (f - y) ** 2
# Automatic differentiation dJdw = tape.gradient(J, w)
# Update parameter w.assign_add(-alpha * dJdw)
# Specify optimizeroptimizer = keras.optimizers.Adam(learning_rate=specified_value)
# Training loop for 200 iterationsfor iteration in range(200): with tf.GradientTape() as tape: # Compute cost function J = cost_function(x, w, b, y_norm, r, num_users, num_movies, lambda_)
# Get gradients grads = tape.gradient(J, [x, w, b])
# Apply gradients optimizer.apply_gradients(zip(grads, [x, w, b]))
The collaborative filtering cost function takes inputs:
“With TensorFlow and Auto Diff you’re not limited to just gradient descent. You can also use a more powerful optimization algorithm like the adam optimizer.”
Traditional gradient descent updates all parameters:
w := w - α * (∂J/∂w)b := b - α * (∂J/∂b)x := x - α * (∂J/∂x)
With Adam optimizer, TensorFlow handles the complex update rules automatically.
“The data set you use in the practice lab is a real data set comprising actual movies rated by actual people. This is the movie lens dataset and it’s due to Harper and Konstan.”
“That’s why we had to implement it this other way where we would implement the cost function ourselves. But then use TensorFlow’s tools for automatic differentiation, also called Auto Diff. And use TensorFlow’s implementation of the adam optimization algorithm to let it do a lot of the work for us of optimizing the cost function.”
“If the model you have is a sequence of dense neural network layers or other types of layers supported by TensorFlow, and the old implementation recipe of model compound model fit works.”
“But even when it isn’t, these tools TensorFlow give you a very effective way to implement other learning algorithms as well.”
Auto Diff vs Auto Grad: “Sometimes you hear people call this Auto Grad. The technically correct term is Auto Diff, and Auto Grad is actually the name of the specific software package for doing automatic differentiation, for taking derivatives automatically.”
TensorFlow’s automatic differentiation capability makes implementing complex optimization algorithms like collaborative filtering much more accessible without requiring manual derivative calculations.