PyTorch is a fairly new deeplearning framework released by Facebook, which reminds me of the JS framework frenzy. But having played around with PyTorch a slight bit, it already feels fun.
To keep things short, I liked it because:
 Unlike TensorFlow it allows me to easily print Tensors on the screen (no seriously, this is a big deal for me since I usually take several iterations to get a DL implementation right).
 TensorFlow adds a layer between Python and TensorFlow. TensorFlow even has it’s own variable scope. This is way too much abstraction, that I don’t appreciate for my experimental interests.
 Interop with numpy is easy in PyTorch, with the simple
.numpy()
suffix to convert a Tensor to a numpy array.  Unlike Torch, it is not in Lua (also doesn’t need the LuaRocks package manager).
 Unlike Caffe2, I don’t have to write C++ code and write build scripts.
PyTorch’s website has a 60 min. blitz tutorial, which is laid out pretty well.
Here is the summary to get you started on PyTorch:
torch.Tensor
is yournp.array
(the NumPy array).torch.Tensor(3,4)
will create aTensor
of shape (3,4). All the functions are pretty standard. Such as
torch.rand
can be used to generate random Tensors.  Indexing in Tensors is pretty similar to NumPy as well.
.numpy()
allows converting Tensor to a numpy array. For the purpose of a compute graph, PyTorch lets you create
Variable
s which are similar toplaceholder
in TF.  Creating a compute graph and computing the gradient is pretty easy (and automatic).
This is all it takes to compute the gradient, where x
is a variable:
1 2 3 4 5 6 7 

Doing backprop simply with the backward
method call on the scalar out
, computes gradients all the way to x
. This is amazing!
 For NN, there is an
nn.Module
which wraps around the boring boilerplate.  A simple NN implementation is below for solving the MNIST dataset, instead of the CIFAR10 dataset that the tutorial solves:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

As seen, in the __init__
method, we just need to define the various NN layers we are going to be using. Then the forward
method just runs through them. The view
method is analogous to the NumPy reshape
method.
The gradients will be applied after the backward pass, which is autocomputed. The code is selfexplanatory and fairly easy to understand.
 Torch also keeps track of how to retrieve standard datasets such as CIFAR10, MNIST, etc.
 After getting the data, from the dataloader you can proceed to play with it. Below are rest of the pieces required to complete the implementation (almost all of it is from the tutorial):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 

The criterion
object is used to compute your loss function. optim
has a bunch of convex optimization algorithms such as vanilla SGD, Adam, etc. As promised, simply calling the backward
method on the loss object allows computing the gradient.
 Assuming you are working on the tutorial. Try to solve the tutorial for MNIST data instead of CIFAR10.
 Instead of the 3channel (RGB) image of size 24x24 pixels, the MNIST images are single channel 28x28 pixel images.
Overall, I could get to 96% accuracy, with the current setup. The complete gist is here.