问题
I am working on a project which uses pyspark, and would like to set up automated tests.
Here's what my .gitlab-ci.yml
file looks like:
image: "myimage:latest"
stages:
- Tests
pytest:
stage: Tests
script:
- pytest tests/.
I built the docker image myimage
using a Dockerfile such as the following (see this excellent answer):
FROM python:3.7
RUN python --version
# Create app directory
WORKDIR /app
# copy requirements.txt
COPY local-src/requirements.txt ./
# Install app dependencies
RUN pip install -r requirements.txt
# Bundle app source
COPY src /app
However, when I run this, the gitlab CI job errors with the following:
/usr/local/lib/python3.7/site-packages/pyspark/java_gateway.py:95: in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
E Exception: Java gateway process exited before sending the driver its port number
------------------------------- Captured stderr --------------------------------
JAVA_HOME is not set
I understand that pyspark requires me to have JAVA8 or higher installed on my computer. I have this set up alright locally, but...what about during the CI process? How can I install Java so it works?
I have tried adding
RUN sudo add-apt-repository ppa:webupd8team/java
RUN sudo apt-get update
RUN apt-get install oracle-java8-installer
to the Dockerfile which created the image, but got the error
/bin/sh: 1: sudo: not found
.
How can I modify the Dockerfile so that tests using pyspark will work?
回答1:
Solution that worked for me: add
RUN apt-get update
RUN apt-get install default-jdk -y
before
RUN pip install -r requirements.txt
It then all worked as expected with no further modifications needed!
EDIT
To make this work, I've had to update my base image to python:3.7-stretch
回答2:
Write in your .bash_profile:
export JAVA_HOME=(the home directory in your jdk i.e. /Library/Java/JavaVirtualMachines/[yourjdk]/Contents/Home)
来源:https://stackoverflow.com/questions/57676684/ci-cd-tests-involving-pyspark-java-home-is-not-set