2.5 装载TensorFlow

TensorFlow的安装比较简单,官网上提供了详细的说明,具体内容请参阅TensorFlow的网上链接。

需要提醒的是,建议使用Virtualenv来安装TensorFlow,安装完TensorFlow和GPU (图形加速器)支持后,需要验证。

由于某些用户在安装TensorFlow GPU支持时会遇到问题,因此接下来将介绍如何安装GPU支持。

首先,开发者要在Developer Nvidia Website上注册。

然后,按照此链接安装GPU。

接着,还需要安装CUDA Toolkit 9.0, tensorflow.org中的链接始终指向最新的CUDA版本,现在是9.2版本。但是不要使用9.2版本,除非TensorFlow支持它。请使用上面链接的CUDA 9.0版本。

同样,请下载并安装cuDNN v7.1.4 for CUDA 9.0, tensorflow.org中的链接指向的最新版cuDNN是CUDA 9.2的v7.1.4版本。安装并运行如下命令:

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

接着,运行命令“$ nvidia-smi”,得到如下结果:

Fri Jun 15 22:21:08 2018
+---------------------------------------------------------------------+
| NVIDIA-SMI 384.130                   Driver Version: 384.130                       |
|----------------------------+------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id       Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|       Memory-Usage | GPU-Util  Compute M. |
|============================+====================+====================|
|    0  Quadro K600           Off  | 00000000:05:00.0 Off |                      N/A |
| 25%    48C     P0     N/A /  N/A |       0MiB /    979MiB |       0%       Default |
+----------------------------+--------------------+--------------------+

+----------------------------------------------------------------------+
| Processes:                                                                  GPU Memory |
|  GPU        PID    Type    Process name                                   Usage       |
|======================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------+

此外,还要在CUDA示例代码中运行deviceQuery,以确保GPU正常工作。

Device 0: "Quadro 600"
CUDA Driver Version / Runtime Version            9.0 / 9.0
CUDA Capability Major/Minor version number:     2.1
Total amount of global memory:                    962 MBytes (1009254400 bytes)

执行结果如下:

Quadro M6000

如果你看到类似的结果,说明你的显卡可以支持TensorFlow。

然后,我们执行下面的命令:

$ ./bin/x86_64/linux/release/deviceQuery

执行结果会显示显卡的版本号和各种性能数据:

./bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro M6000 24GB"
  CUDA Driver Version / Runtime Version           9.0 / 9.0
  CUDA Capability Major/Minor version number:  5.2
  Total amount of global memory:              24467 MBytes (25655836672 bytes)
  (24) Multiprocessors, (128) CUDA Cores/MP:      3072 CUDA Cores
  GPU Max Clock rate:                                  1114 MHz (1.11 GHz)
  Memory Clock rate:                                   3305 Mhz
  Memory Bus Width:                                    384-bit
  L2 Cache Size:                                        3145728 bytes
  Maximum Texture Dimension Size (x, y, z)        1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum  Layered  2D  Texture  Size,  (num)  layers   2D=(16384,  16384),  2048 layers
  Total amount of constant memory:                  65536 bytes
  Total amount of shared memory per block:        49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                             32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:             1024
  Max dimension size of a thread block (x, y, z): (1024, 1024, 64)
  Max dimension size of a grid size     (x, y, z): (2147483647, 65535, 65535)
  Maximum memory pitch:                               2147483647 bytes
  Texture alignment:                                   512 bytes
  Concurrent copy and kernel execution:            Yes with 2 copy engine(s)
  Run time limit on kernels:                         Yes
  Integrated GPU sharing Host Memory:              No
  Support host page-locked memory mapping:        Yes
  Alignment requirement for Surfaces:              Yes
  Device has ECC support:                             Disabled
  Device supports Unified Addressing (UVA):       Yes
  Supports Cooperative Kernel Launch:              No
  Supports MultiDevice Co-op Kernel Launch:       No
  Device PCI Domain ID / Bus ID / location ID:    0 / 4 / 0
  Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
$ nvidia-smi
+---------------------------------------------------------------------+
| NVIDIA-SMI 384.130                   Driver Version: 384.130                       |
|-----------------------------+--------------------+-------------------+
| GPU  Name          Persistence-M| Bus-Id      Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|       Memory-Usage | GPU-Util  Compute M. |
|=============================+===================+====================|
|    0  Quadro M6000 24GB    Off  | 00000000:04:00.0  On |                    Off |
| 25%    41C     P8     20W / 250W |     488MiB / 24467MiB |       0%      Default |
+----------------------------+---------------------+-------------------+

+----------------------------------------------------------------------+
| Processes:                                                                  GPU Memory |
|  GPU        PID    Type    Process name                                   Usage       |
|=====================================================================|
|     0       2183       G    /usr/lib/xorg/Xorg                              319MiB |
|     0       3796       G    compiz                                                92MiB |
|     0       6095      G    ...-token=32ADD0D4261B4355966B2810A61BBF37  72MiB |
+---------------------------------------------------------------------+

最后,还要安装TensorFlow GPU:

(tensorflow)$ pip install --upgrade tensorflow       # for Python 2.7
(tensorflow)$ pip3 install --upgrade tensorflow      # for Python 3.n
(tensorflow)$ pip install --upgrade tensorflow-gpu  # for Python 2.7 and GPU
(tensorflow)$ pip3 install --upgrade tensorflow-gpu # for Python 3.n and GPU

安装成功之后,可以用下面的命令确认:

(tensorflow) $ python
Python 2.7.12 (default, Dec  4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant("hello")
>>> sess = tf.Session()
**2018-06-20   06:54:34.284161:   I   tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA**
**2018-06-20  06:54:34.460555:  I  tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: **
**name: Quadro M6000 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.114**
**pciBusID: 0000:04:00.0**
**totalMemory: 23.89GiB freeMemory: 23.29GiB**
**2017-05-20  06:54:34.460600:  I  tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0**
**2017-05-20  06:54:34.708584:  I  tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:**
**2017-05-20  06:54:34.708635:  I  tensorflow/core/common_runtime/gpu/gpu_device.cc:929]       0 **
**2017-05-20  06:54:34.708644:  I  tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:    N **
**2017-05-20  06:54:34.709069:  I  tensorflow/core/common_runtime/gpu/gpu_device.cc:1053]  Created  TensorFlow  device  (/job:localhost/replica:0/task:0/device:GPU:0 with 22598 MB memory) -> physical GPU (device: 0, name**: Quadro M6000 24GB, pci bus id: 0000:04:00.0, compute capability: 5.2)**
>>> print(sess.run(hello))
hello

上面的代码明确地显示,开发者正在使用GPU!如果看不到这段代码,说明开发者并没有成功安装GPU,而是在使用CPU。