I have the last version of R - 3.2.1. Now I want to install SparkR on R. After I execute:
> install.packages(\"SparkR\")
I got back:
I also faced similar issue while trying to play with SparkR in EMR with Spark 2.0.0. I'll post the steps here that I followed to install rstudio server, SparkR, sparklyr, and finally connecting to a spark session in a EMR cluster:
wget https://download2.rstudio.org/rstudio-server-rhel-0.99.903-x86_64.rpm
then install using yum install
sudo yum install --nogpgcheck rstudio-server-rhel-0.99.903-x86_64.rpm
finally add a user to access rstudio web console as:
sudo su
sudo useradd username
sudo echo username:password | chpasswd
ssh -NL 8787:ec2-emr-master-node-ip.compute-1.amazonaws.com:8787 hadoop@ec2-emr-master-node-ip.compute-1.amazonaws.com&
Now open any browser and type localhost:8787
to go the rstudio Web console and use the username:password
combo to login.
To install the required R packages you need to install libcurl
into the master node first like below:
sudo yum update
sudo yum -y install libcurl-devel
sudo -u hdfs hadoop fs -mkdir /user/
sudo -u hdfs hadoop fs -chown /user/
SPARK_HOME
:spark-submit --version
export SPARK_HOME='/usr/lib/spark/'
SparkR
like below:install.packages('devtools')
devtools::install_github('apache/spark@v2.0.0', subdir='R/pkg')
install.packages('sparklyr')
library(SparkR)
library(sparklyr)
Sys.setenv(SPARK_HOME='/usr/lib/spark')
sc <- spark_connect(master = "yarn-client")