Installing of SparkR

后端 未结 4 1512
不思量自难忘°
不思量自难忘° 2020-11-27 03:21

I have the last version of R - 3.2.1. Now I want to install SparkR on R. After I execute:

> install.packages(\"SparkR\")

I got back:

4条回答
  •  一生所求
    2020-11-27 04:05

    I also faced similar issue while trying to play with SparkR in EMR with Spark 2.0.0. I'll post the steps here that I followed to install rstudio server, SparkR, sparklyr, and finally connecting to a spark session in a EMR cluster:

    1. Install rstudio server: After the EMR cluster is up and running, ssh to the master node with user 'hadoop@' and download rstudio server

    wget https://download2.rstudio.org/rstudio-server-rhel-0.99.903-x86_64.rpm

    then install using yum install

    sudo yum install --nogpgcheck rstudio-server-rhel-0.99.903-x86_64.rpm

    finally add a user to access rstudio web console as:

    sudo su

    sudo useradd username

    sudo echo username:password | chpasswd

    1. To acess rstudio Web console you need to create a SSH tunnel from your machine to the EMR master node like below:

    ssh -NL 8787:ec2-emr-master-node-ip.compute-1.amazonaws.com:8787 hadoop@ec2-emr-master-node-ip.compute-1.amazonaws.com&

    1. Now open any browser and type localhost:8787 to go the rstudio Web console and use the username:password combo to login.

    2. To install the required R packages you need to install libcurl into the master node first like below:

    sudo yum update

    sudo yum -y install libcurl-devel

    1. Resolve permission issues with:

    sudo -u hdfs hadoop fs -mkdir /user/

    sudo -u hdfs hadoop fs -chown /user/

    1. Check Spark version in EMR and set SPARK_HOME:

    spark-submit --version

    export SPARK_HOME='/usr/lib/spark/'

    1. Now in the rstudio console install SparkR like below:

    install.packages('devtools')

    devtools::install_github('apache/spark@v2.0.0', subdir='R/pkg')

    install.packages('sparklyr')

    library(SparkR)

    library(sparklyr)

    Sys.setenv(SPARK_HOME='/usr/lib/spark')

    sc <- spark_connect(master = "yarn-client")

提交回复
热议问题