Enabling PyTorch acceleration using CUDA

One of the main benefits of PyTorch is its ability to enable acceleration through the use of a graphics processing unit (GPU). Deep learning is a computational task that is easily parallelizable, meaning that the calculations can be broken down into smaller tasks and calculated across many smaller processors. This means that instead of needing to execute the task on a single CPU, it is more efficient to perform the calculation on a GPU.

GPUs were originally created to efficiently render graphics, but since deep learning has grown in popularity, GPUs have been frequently used for their ability to perform multiple calculations simultaneously. While a traditional CPU may consist of around four or eight cores, a GPU consists of hundreds of smaller cores. Because calculations can be executed across all these cores simultaneously, GPUs can rapidly reduce the time taken to perform deep learning tasks.

Consider a single pass within a neural network. We may take a small batch of data, pass it through our network to obtain our loss, and then backpropagate, adjusting our parameters according to the gradients. If we have many batches of data to do this over, on a traditional CPU, we must wait until batch 1 has completed before we can compute this for batch 2:

Figure 2.7 – One pass in a neural network

However, on a GPU, we can perform all these steps simultaneously, meaning there is no requirement for batch 1 to finish before batch 2 can be started. We can calculate the parameter updates for all batches simultaneously and then perform all the parameter updates in one go (as the results are independent of one another). The parallel approach can vastly speed up the machine learning process:

Figure 2.8 – Parallel approach to perform passes

Compute Unified Device Architecture (CUDA) is the technology specific to Nvidia GPUs that enables hardware acceleration on PyTorch. In order to enable CUDA, we must first make sure the graphics card on our system is CUDA-compatible. A list of CUDA-compatible GPUs can be found here: https://developer.nvidia.com/cuda-gpus. If you have a CUDA-compatible GPU, then CUDA can be installed from this link: https://developer.nvidia.com/cuda-downloads. We will activate it using the following steps:

  1. Firstly, in order to actually enable CUDA support on PyTorch, you will have to build PyTorch from source. Details about how this can be done can be found here: https://github.com/pytorch/pytorch#from-source.
  2. Then, to actually CUDA within our PyTorch code, we must type the following into our Python code:

    cuda = torch.device('cuda')

    This sets our default CUDA device's name to 'cuda'.

  3. We can then execute operations on this device by manually specifying the device argument in any tensor operations:

    x = torch.tensor([5., 3.], device=cuda)

    Alternatively, we can do this by calling the cuda method:

    y = torch.tensor([4., 2.]).cuda()

  4. We can then run a simple operation to ensure this is working correctly:

    x*y

    This results in the following output:

Figure 2.9 – Tensor multiplication output using CUDA

The changes in speed will not be noticeable at this stage as we are just creating a tensor, but when we begin training models at scale later, we will see the speed benefits of parallelizing our computations using CUDA. By training our models in parallel, we will be able to reduce the time this takes by a considerable amount.