How to compile Tensorflow with SSE4.2 and AVX instructions?

前端 未结 12 946
南笙
南笙 2020-11-22 04:14

This is the message received from running a script to check if Tensorflow is working:

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUD         


        
12条回答
  •  夕颜
    夕颜 (楼主)
    2020-11-22 04:29

    Let me answer your 3rd question first:

    If you want to run a self-compiled version within a conda-env, you can. These are the general instructions I run to get tensorflow to install on my system with additional instructions. Note: This build was for an AMD A10-7850 build (check your CPU for what instructions are supported...it may differ) running Ubuntu 16.04 LTS. I use Python 3.5 within my conda-env. Credit goes to the tensorflow source install page and the answers provided above.

    git clone https://github.com/tensorflow/tensorflow 
    # Install Bazel
    # https://bazel.build/versions/master/docs/install.html
    sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
    # Create your virtual env with conda.
    source activate YOUR_ENV
    pip install six numpy wheel, packaging, appdir
    # Follow the configure instructions at:
    # https://www.tensorflow.org/install/install_sources
    # Build your build like below. Note: Check what instructions your CPU 
    # support. Also. If resources are limited consider adding the following 
    # tag --local_resources 2048,.5,1.0 . This will limit how much ram many
    # local resources are used but will increase time to compile.
    bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2  -k //tensorflow/tools/pip_package:build_pip_package
    # Create the wheel like so:
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
    # Inside your conda env:
    pip install /tmp/tensorflow_pkg/NAME_OF_WHEEL.whl
    # Then install the rest of your stack
    pip install keras jupyter etc. etc.
    

    As to your 2nd question:

    A self-compiled version with optimizations are well worth the effort in my opinion. On my particular setup, calculations that used to take 560-600 seconds now only take about 300 seconds! Although the exact numbers will vary, I think you can expect about a 35-50% speed increase in general on your particular setup.

    Lastly your 1st question:

    A lot of the answers have been provided above already. To summarize: AVX, SSE4.1, SSE4.2, MFA are different kinds of extended instruction sets on X86 CPUs. Many contain optimized instructions for processing matrix or vector operations.

    I will highlight my own misconception to hopefully save you some time: It's not that SSE4.2 is a newer version of instructions superseding SSE4.1. SSE4 = SSE4.1 (a set of 47 instructions) + SSE4.2 (a set of 7 instructions).

    In the context of tensorflow compilation, if you computer supports AVX2 and AVX, and SSE4.1 and SSE4.2, you should put those optimizing flags in for all. Don't do like I did and just go with SSE4.2 thinking that it's newer and should superseed SSE4.1. That's clearly WRONG! I had to recompile because of that which cost me a good 40 minutes.

提交回复
热议问题