Hands-On GPU Programming with Python and CUDA
Dr. Brian Tuomanen更新时间:2021-06-10 19:26:12
最新章节:Leave a review - let other readers know what you thinkcoverpage
Title Page
Dedication
About Packt
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Why GPU Programming?
Technical requirements
Parallelization and Amdahl's Law
Using Amdahl's Law
The Mandelbrot set
Profiling your code
Using the cProfile module
Summary
Questions
Setting Up Your GPU Programming Environment
Technical requirements
Ensuring that we have the right hardware
Checking your hardware (Linux)
Checking your hardware (windows)
Installing the GPU drivers
Installing the GPU drivers (Linux)
Installing the GPU drivers (Windows)
Setting up a C++ programming environment
Setting up GCC Eclipse IDE and graphical dependencies (Linux)
Setting up Visual Studio (Windows)
Installing the CUDA Toolkit
Installing the CUDA Toolkit (Linux)
Installing the CUDA Toolkit (Windows)
Setting up our Python environment for GPU programming
Installing PyCUDA (Linux)
Creating an environment launch script (Windows)
Installing PyCUDA (Windows)
Testing PyCUDA
Summary
Questions
Getting Started with PyCUDA
Technical requirements
Querying your GPU
Querying your GPU with PyCUDA
Using PyCUDA's gpuarray class
Transferring data to and from the GPU with gpuarray
Basic pointwise arithmetic operations with gpuarray
A speed test
Using PyCUDA's ElementWiseKernel for performing pointwise computations
Mandelbrot revisited
A brief foray into functional programming
Parallel scan and reduction kernel basics
Summary
Questions
Kernels Threads Blocks and Grids
Technical requirements
Kernels
The PyCUDA SourceModule function
Threads blocks and grids
Conway's game of life
Thread synchronization and intercommunication
Using the __syncthreads() device function
Using shared memory
The parallel prefix algorithm
The naive parallel prefix algorithm
Inclusive versus exclusive prefix
A work-efficient parallel prefix algorithm
Work-efficient parallel prefix (up-sweep phase)
Work-efficient parallel prefix (down-sweep phase)
Work-efficient parallel prefix — implementation
Summary
Questions
Streams Events Contexts and Concurrency
Technical requirements
CUDA device synchronization
Using the PyCUDA stream class
Concurrent Conway's game of life using CUDA streams
Events
Events and streams
Contexts
Synchronizing the current context
Manual context creation
Host-side multiprocessing and multithreading
Multiple contexts for host-side concurrency
Summary
Questions
Debugging and Profiling Your CUDA Code
Technical requirements
Using printf from within CUDA kernels
Using printf for debugging
Filling in the gaps with CUDA-C
Using the Nsight IDE for CUDA-C development and debugging
Using Nsight with Visual Studio in Windows
Using Nsight with Eclipse in Linux
Using Nsight to understand the warp lockstep property in CUDA
Using the NVIDIA nvprof profiler and Visual Profiler
Summary
Questions
Using the CUDA Libraries with Scikit-CUDA
Technical requirements
Installing Scikit-CUDA
Basic linear algebra with cuBLAS
Level-1 AXPY with cuBLAS
Other level-1 cuBLAS functions
Level-2 GEMV in cuBLAS
Level-3 GEMM in cuBLAS for measuring GPU performance
Fast Fourier transforms with cuFFT
A simple 1D FFT
Using an FFT for convolution
Using cuFFT for 2D convolution
Using cuSolver from Scikit-CUDA
Singular value decomposition (SVD)
Using SVD for Principal Component Analysis (PCA)
Summary
Questions
The CUDA Device Function Libraries and Thrust
Technical requirements
The cuRAND device function library
Estimating π with Monte Carlo
The CUDA Math API
A brief review of definite integration
Computing definite integrals with the Monte Carlo method
Writing some test cases
The CUDA Thrust library
Using functors in Thrust
Summary
Questions
Implementation of a Deep Neural Network
Technical requirements
Artificial neurons and neural networks
Implementing a dense layer of artificial neurons
Implementation of the softmax layer
Implementation of Cross-Entropy loss
Implementation of a sequential network
Implementation of inference methods
Gradient descent
Conditioning and normalizing data
The Iris dataset
Summary
Questions
Working with Compiled GPU Code
Launching compiled code with Ctypes
The Mandelbrot set revisited (again)
Compiling the code and interfacing with Ctypes
Compiling and launching pure PTX code
Writing wrappers for the CUDA Driver API
Using the CUDA Driver API
Summary
Questions
Performance Optimization in CUDA
Dynamic parallelism
Quicksort with dynamic parallelism
Vectorized data types and memory access
Thread-safe atomic operations
Warp shuffling
Inline PTX assembly
Performance-optimized array sum
Summary
Questions
Where to Go from Here
Furthering your knowledge of CUDA and GPGPU programming
Multi-GPU systems
Cluster computing and MPI
OpenCL and PyOpenCL
Graphics
OpenGL
DirectX 12
Vulkan
Machine learning and computer vision
The basics
cuDNN
Tensorflow and Keras
Chainer
OpenCV
Blockchain technology
Summary
Questions
Assessment
Chapter 1 Why GPU Programming?
Chapter 2 Setting Up Your GPU Programming Environment
Chapter 3 Getting Started with PyCUDA
Chapter 4 Kernels Threads Blocks and Grids
Chapter 5 Streams Events Contexts and Concurrency
Chapter 6 Debugging and Profiling Your CUDA Code
Chapter 7 Using the CUDA Libraries with Scikit-CUDA
Chapter 8 The CUDA Device Function Libraries and Thrust
Chapter 9 Implementation of a Deep Neural Network
Chapter 10 Working with Compiled GPU Code
Chapter 11 Performance Optimization in CUDA
Chapter 12 Where to Go from Here
Other Books You May Enjoy
Leave a review - let other readers know what you think
更新时间:2021-06-10 19:26:12