I am using a Dockerized image and Jupyter notebook along with SparkR kernel. When I create a SparkR notebook, it uses an install of Microsoft R (3.3.2) instead of vanilla C
To use a custom R environment I believe you need to set the following application properties when you start Spark:
"spark.r.command": "/custom/path/bin/R",
"spark.r.driver.command": "/custom/path/bin/Rscript",
"spark.r.shell.command" : "/custom/path/bin/R"
This is more completely documented here: https://spark.apache.org/docs/latest/configuration.html#sparkr
Docker-related issues aside, the settings for Jupyter kernels are configured in files named kernel.json, residing in specific directories (one per kernel) which can be seen using the command jupyter kernelspec list; for example, here is the case in my (Linux) machine:
$ jupyter kernelspec list
Available kernels:
python2 /usr/lib/python2.7/site-packages/ipykernel/resources
caffe /usr/local/share/jupyter/kernels/caffe
ir /usr/local/share/jupyter/kernels/ir
pyspark /usr/local/share/jupyter/kernels/pyspark
pyspark2 /usr/local/share/jupyter/kernels/pyspark2
tensorflow /usr/local/share/jupyter/kernels/tensorflow
Again, as an example, here are the contents of the kernel.json for my R kernel (ir)
{
"argv": ["/usr/lib64/R/bin/R", "--slave", "-e", "IRkernel::main()", "--args", "{connection_file}"],
"display_name": "R 3.3.2",
"language": "R"
}
And here is the respective file for my pyspark2 kernel:
{
"display_name": "PySpark (Spark 2.0)",
"language": "python",
"argv": [
"/opt/intel/intelpython27/bin/python2",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/home/ctsats/spark-2.0.0-bin-hadoop2.6",
"PYTHONPATH": "/home/ctsats/spark-2.0.0-bin-hadoop2.6/python:/home/ctsats/spark-2.0.0-bin-hadoop2.6/python/lib/py4j-0.10.1-src.zip",
"PYTHONSTARTUP": "/home/ctsats/spark-2.0.0-bin-hadoop2.6/python/pyspark/shell.py",
"PYSPARK_PYTHON": "/opt/intel/intelpython27/bin/python2"
}
}
As you can see, in both cases the first element of argv is the executable for the respective language - in my case, GNU R for my ir kernel and Intel Python 2.7 for my pyspark2 kernel. Changing this, so that it points to your GNU R executable, should resolve your issue.