Tuesday, April 9, 2019

Trailhead: Deep Learning and Natural Language Processing

Recently I tackled the Deep Learning and Natural Language Processing Trailhead module.

I can best summarize the experience with an image based on the Suppose you have one rabbit meme explanation of arithmetic:

To be fair, the first unit in the module does call out some prerequisites:

This is an advanced topic, and this module assumes you have a basic understanding of machine learning vocabulary, some experience with Python, and at least a little hands-on experience working with machine learning data and algorithms. If you don’t already have that background, you can get yourself up to speed using the following resources.

I personally think I meet those prerequisites. While I don't work with Python day to day the syntax is familiar enough I figured I can fake it till I make it.

The first two units were reasonably straight forward.

However, by the third unit - Apply Deep Learning to Natural Language Processing it started to get complicated. In particular, with the Hands-on Logistic regression questions 6, and to a lesser extent question 7. These both required completing several TODO lines of Python to derive the loss after 100 epochs.

Let's look at the code from the first part of that that needed to be completed in the TODO sections:

The challenge here is, as with all programming, that all the steps need to be completed successfully before you will get the expected answer. Get any of them wrong and things will go pear shaped fast. I could save you the pain of solving this challenge and provide you the full script, but that doesn't really go with the spirit on Trailhead. Instead I'll add some debugging outputs at various points to show the state of the tensors to hopefully make it clearer what should be happening. At least then if things start to go off track you can pick up the problem immediately.

Exercise 6

TODO: Generate 2 clusters of 100 2d vectors, each one distributed normally, using only two calls of randn()

println(classApoints)
tensor([[ 0.3374, -0.1778],
        [-0.3035, -0.5880],
        [ 0.3486,  0.6603],
        [-0.2196, -0.3792],
        #...
        [-0.7952, -0.9178],
        [ 0.4187, -1.1123],
        [ 1.1227,  0.2646],
        [-0.4698,  1.0866],
        [-0.8892,  0.7647]])
assert(classApoints.size() == torch.Size([100, 2]))
println(classBpoints)
tensor([[ 0.4771,  0.7203],
        [-0.0215,  1.0731],
        [-0.1408, -0.5394],
        [-1.2782, -0.8107],
        #...
        [ 1.1051, -0.5454],
        [ 0.1073,  0.8727],
        [-1.2800, -0.4619],
        [ 1.4342, -1.2103],
        [ 1.3834,  0.0324]])
assert(classBpoints.size() == torch.Size([100, 2]))

TODO: Add the vector [1.0,3.0] to the first cluster and [3.0,1.0] to the second.

println(classApoints)
tensor([[ 1.3374,  2.8222],
        [ 0.6965,  2.4120],
        [ 1.3486,  3.6603],
        [ 0.7804,  2.6208],
        #...
        [ 0.2048,  2.0822],
        [ 1.4187,  1.8877],
        [ 2.1227,  3.2646],
        [ 0.5302,  4.0866],
        [ 0.1108,  3.7647]])
println(classBpoints)
tensor([[ 3.4771,  1.7203],
        [ 2.9785,  2.0731],
        [ 2.8592,  0.4606],
        [ 1.7218,  0.1893],
        #...
        [ 4.1051,  0.4546],
        [ 3.1073,  1.8727],
        [ 1.7200,  0.5381],
        [ 4.4342, -0.2103],
        [ 4.3834,  1.0324]])

TODO: Concatenate these two clusters along dimension 0 so that the points distributed around [1.0, 3.0] all come first

println(inputs)
tensor([[ 1.3374,  2.8222],
        [ 0.6965,  2.4120],
        [ 1.3486,  3.6603],
        [ 0.7804,  2.6208],
        #...
        [ 4.1051,  0.4546],
        [ 3.1073,  1.8727],
        [ 1.7200,  0.5381],
        [ 4.4342, -0.2103],
        [ 4.3834,  1.0324]])

println(inputs.size())
torch.Size([200, 2])

TODO: Create a tensor of target values, 0 for points for the first cluster and # 1 for the points in the second cluster.

println(classA)
tensor([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0])

println(classA.size())
torch.Size([100])
println(classB)
tensor([ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1])

println(classB.size())
torch.Size([100])

# TODO: Initialize a Linear layer to output scores for each class given the 2d examples

println(model)
Linear(in_features=2, out_features=2, bias=True)

# TODO: Define your loss function

Here you need to decide if you are going to use:

  1. MSE Loss for Regression
  2. NLLLoss for Classification
  3. CrossEntropyLoss for Classification

Worst case here, you could just try them each one at a time until you get an answer that matches the expected answers in the Trailhead challenge.

Finishing up exercise 6

After that the rest should fall into place fairly easily based on the prior examples.

Exercise 7: Logistic Regression with a Neural Network

Firstly, the instructions when I completed this included this:

# forward takes self and x as input
#         passes x through linear_to_hidden, then a tanh activation function,
#         and then hidden_to_linear and returns the output

I suspect that should actually be "and then hidden_to_output and returns the output".

The torch.tanh function is required for forward.

Initialize your new network to have in_size 2, hidden_size 6, and out_size 2

You are going to use the NeuralNet class that you just completed defining here.

Define your loss function

As with exercise 6, you can just try the 3 example loss functions to find a good fit for the expected answers.

Finishing up exercise 7

Most of the other parts for this question all fall into place based on the prior example.

Results

It would be accurate to say that this last unit took me way longer than the 90 minutes that Trailhead indicated. But dammit I earned those 100 measly points.