Why does it take ages to install Pandas on Alpine Linux

前端 未结 8 1399
孤独总比滥情好
孤独总比滥情好 2020-11-29 18:19

I\'ve noticed that installing Pandas and Numpy (it\'s dependency) in a Docker container using the base OS Alpine vs. CentOS or Debian takes much longer. I created a little t

8条回答
  •  孤城傲影
    2020-11-29 19:03

    Debian based images use only python pip to install packages with .whl format:

      Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
      Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
    

    WHL format was developed as a quicker and more reliable method of installing Python software than re-building from source code every time. WHL files only have to be moved to the correct location on the target system to be installed, whereas a source distribution requires a build step before installation.

    Wheel packages pandas and numpy are not supported in images based on Alpine platform. That's why when we install them using python pip during the building process, we always compile them from the source files in alpine:

      Downloading pandas-0.22.0.tar.gz (11.3MB)
      Downloading numpy-1.14.1.zip (4.9MB)
    

    and we can see the following inside container during the image building:

    / # ps aux
    PID   USER     TIME   COMMAND
        1 root       0:00 /bin/sh -c pip install pandas
        7 root       0:04 {pip} /usr/local/bin/python /usr/local/bin/pip install pandas
       21 root       0:07 /usr/local/bin/python -c import setuptools, tokenize;__file__='/tmp/pip-build-en29h0ak/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n
      496 root       0:00 sh
      660 root       0:00 /bin/sh -c gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/pri
      661 root       0:00 gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/private -Inump
      662 root       0:00 /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/cc1 -quiet -I build/src.linux-x86_64-3.6/numpy/core/src/private -I numpy/core/include -I build/src.linux-x86_64-3.6/numpy/core/includ
      663 root       0:00 ps aux
    

    If we modify Dockerfile a little:

    FROM python:3.6.4-alpine3.7
    RUN apk add --no-cache g++ wget
    RUN wget https://pypi.python.org/packages/da/c6/0936bc5814b429fddb5d6252566fe73a3e40372e6ceaf87de3dec1326f28/pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
    RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
    

    we get the following error:

    Step 4/4 : RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
     ---> Running in 0faea63e2bda
    pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl is not a supported wheel on this platform.
    The command '/bin/sh -c pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl' returned a non-zero code: 1
    

    Unfortunately, the only way to install pandas on an Alpine image is to wait until build finishes.

    Of course if you want to use the Alpine image with pandas in CI for example, the best way to do so is to compile it once, push it to any registry and use it as a base image for your needs.

    EDIT: If you want to use the Alpine image with pandas you can pull my nickgryg/alpine-pandas docker image. It is a python image with pre-compiled pandas on the Alpine platform. It should save your time.

提交回复
热议问题