Can sparklyr be used with spark deployed on yarn-managed hadoop cluster?

后端未结
关注
 4  1997
自闭症患者 2020-12-06 08:00
Is the sparklyr R package able to connect to YARN-managed hadoop clusters? This doesn\'t seem to be documented in the cluster deployment documentation. Using the Spark

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   轻奢々
                                             
                
                
                (楼主)
            
              
              
                2020-12-06 08:26
              

            
            
                        
Are you possibly using Cloudera Hadoop (CDH)?

I am asking as I had the same issue when using the CDH-provided Spark distro:

Sys.getenv('SPARK_HOME')
[1] "/usr/lib/spark"  # CDH-provided Spark
library(sparklyr)
sc <- spark_connect(master = "yarn-client")
Error in sparkapi::start_shell(master = master, spark_home = spark_home,  : 
      Failed to launch Spark shell. Ports file does not exist.
        Path: /usr/lib/spark/bin/spark-submit
        Parameters: --jars, '/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library/sparklyr/java/sparklyr.jar', --packages, 'com.databricks:spark-csv_2.11:1.3.0','com.amazonaws:aws-java-sdk-pom:1.10.34', sparkr-shell, /tmp/Rtmp6RwEnV/file307975dc1ea0.out

Ivy Default Cache set to: /home/oracle/.ivy2/cache
The jars for the packages stored in: /home/oracle/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency
com.amazonaws#aws-java-sdk-pom added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found com.databricks#spark-csv_2.11;1.3.0 in central
    found org.apache.commons#commons-csv;1.1 in central
    found com.univocity#univocity-parsers;1.5.1 in central
    found com.


However, after I downloaded a pre-built version from Databricks (Spark 1.6.1, Hadoop 2.6) and pointed SPARK_HOME there, I was able to connect successfully:


Sys.setenv(SPARK_HOME = '/home/oracle/spark-1.6.1-bin-hadoop2.6') 
sc <- spark_connect(master = "yarn-client") # OK
library(dplyr)
iris_tbl <- copy_to(sc, iris)
src_tbls(sc)
[1] "iris"


Cloudera does not yet include SparkR in its distribution, and I suspect that sparklyr may still have some subtle dependency on SparkR. Here are the results when trying to work with the CDH-provided Spark, but using the config=list() argument, as suggested in this thread from sparklyr issues at Github:

sc <- spark_connect(master='yarn-client', config=list()) # with CDH-provided Spark
Error in sparkapi::start_shell(master = master, spark_home = spark_home,  : 
  Failed to launch Spark shell. Ports file does not exist.
    Path: /usr/lib/spark/bin/spark-submit
    Parameters: --jars, '/u01/app/oracle/product/12.1.0.2/dbhome_1/R/library/sparklyr/java/sparklyr.jar', sparkr-shell, /tmp/Rtmpi9KWFt/file22276cf51d90.out

Error: sparkr.zip does not exist for R application in YARN mode.


Also, if you check the rightmost part of the Parameters part of the error (both yours and mine), you'll see a reference to sparkr-shell...

(Tested with sparklyr 0.2.28, sparkapi 0.3.15, R session from RStudio Server, Oracle Linux) 
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复