Optimizing network architecture

Once we have calculated the loss of our network, we will optimize the weights to reduce the loss and thus improving the accuracy of the algorithm. For the sake of simplicity, let's see these optimizers as black boxes that take loss functions and all the learnable parameters and move them slightly to improve our performances. PyTorch provides most of the commonly used optimizers required in deep learning. If you want to explore what happens inside these optimizers and have a mathematical background, I would strongly recommend some of the following blogs:

Some of the optimizers that PyTorch provides are as follows:

  • ADADELTA
  • Adagrad
  • Adam
  • SparseAdam
  • Adamax
  • ASGD
  • LBFGS
  • RMSProp
  • Rprop
  • SGD

We will get into the details of some of the algorithms in Chapter 4Fundamentals of Machine Learning, along with some of the advantages and tradeoffs. Let's walk through some of the important steps in creating any optimizer:

optimizer = optim.SGD(model.parameters(), lr = 0.01)

In the preceding example, we created an SGD optimizer that takes all the learnable parameters of your network as the first argument and a learning rate that determines what ratio of change can be made to the learnable parameter. In Chapter 4Fundamentals of Machine Learning we will get into more details of learning rates and momentum, which is an important parameter of optimizers. Once you create an optimizer object, we need to call zero_grad() inside our loop, as the parameters will accumulate the gradients created during the previous optimizer call:

for input, target in dataset:
    optimizer.zero_grad()
    output = model(input)
    loss = loss_fn(output, target)
    loss.backward()
    optimizer.step()

Once we call backward on the loss function, which calculates the gradients (quantity by which learnable parameters need to change), we call optimizer.step(), which makes the actual changes to our learnable parameter.

Now, we have covered most of the components required to help a computer see/ recognize images. Let's build a complex deep learning model that can differentiate between dogs and cats to put all the theory into practice.