sparkr | 易学教程

Reading Text file in SparkR 1.4.0

阅读更多关于 Reading Text file in SparkR 1.4.0

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 由翻译强力驱动问题: Does anyone know how to read a text file in SparkR version 1.4.0? Are there any Spark packages available for that? 回答1: Spark 1.6+ You can use text input format to read text file as a DataFrame : read . df ( sqlContext = sqlContext , source = "text" , path = "README.md" ) Spark <= 1.5 Short answer is you don't. SparkR 1.4 has been almost completely stripped from low level API, leaving only a limited subset of Data Frame operations. As you can read on an old SparkR webpage : As of April 2015, SparkR has been officially merged into

Reading csv data into SparkR after writing it out from a DataFrame

阅读更多关于 Reading csv data into SparkR after writing it out from a DataFrame

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 由翻译强力驱动问题: I followed the example in this post to write out a DataFrame as a csv to an AWS S3 bucket. The result was not a single file but rather a folder with many .csv files. I'm now having trouble reading in this folder as a DataFrame in SparkR. Below is what I've tried but they do not result in the same DataFrame that I wrote out. write . df ( df , 's3a://bucket/df' , source = "csv" ) #Creates a folder named df in S3 bucket df_in1 <- read . df ( "s3a://bucket/df" , source = "csv" ) df_in2 <- read . df ( "s3a://bucket/df/*.csv" , source =

How to make a new DataFrame in sparkR

阅读更多关于 How to make a new DataFrame in sparkR

问题 In sparkR I have data as a DataFrame. I can attach one entry in data like this: newdata <- filter(data, data$column == 1) How can I attach more than just one? Say I want to attach all elements in the vector list <- c(1,6,10,11,14) or if list is a DataFrame 1 6 10 11 14 . newdata <- filter(data, data$column == list) If I do it like this I get an error. 回答1: If you are ultimately trying to filter a spark DataFrame by a list of unique values, you can do this with a merge operation. If you are

Error while installing SparkR package using install_github

阅读更多关于 Error while installing SparkR package using install_github

问题 I am trying to use the SparkR package in R. I have all dependent packages like devtools , Rtools.exe , etc. When I try the following command: install_github("amplab-extras/SparkR-pkg",subdir="pkg") I get the following error: Downloading github repo amplab-extras/SparkR-pkg@master Error in function (type, msg, asError = TRUE ) : Received HTTP code 403 from proxy after CONNECT To solve this I have set a working http_proxy, https_proxy but it is not working and throws above error. Please guide

Error while installing SparkR package using install_github

阅读更多关于 Error while installing SparkR package using install_github

I am trying to use the SparkR package in R. I have all dependent packages like devtools , Rtools.exe , etc. When I try the following command: install_github("amplab-extras/SparkR-pkg",subdir="pkg") I get the following error: Downloading github repo amplab-extras/SparkR-pkg@master Error in function (type, msg, asError = TRUE ) : Received HTTP code 403 from proxy after CONNECT To solve this I have set a working http_proxy, https_proxy but it is not working and throws above error. Please guide as I am new to R/RStudio. I have installed SparkR on Windows 7, 64 bit with R-3.2.x and having Spark 1.4

if null replace with 0, otherwise default value in same column

阅读更多关于 if null replace with 0, otherwise default value in same column

问题 In SparkR shell 1.5.0, Created a sample data set: df_test <- createDataFrame(sqlContext, data.frame(mon = c(1,2,3,4,5), year = c(2011,2012,2013,2014,2015))) df_test1 <- createDataFrame(sqlContext, data.frame(mon1 = c(1,2,3,4,5,6,7,8))) df_test2 <- join(df_test1, df_test, joinExpr = df_test1$mon1 == df_test$mon, joinType = "left_outer") data set : df_test2 +----+----+------+ |mon1| mon| year| +----+----+------+ | 7.0|null| null| | 1.0| 1.0|2011.0| | 6.0|null| null| | 3.0| 3.0|2013.0| | 5.0| 5

Is it possible to use R package in sparkR? [closed]

阅读更多关于 Is it possible to use R package in sparkR? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I'm studying about sparkR, and I know there are so many useful R package in CRAN. But it seems like R package can't be used in sparkR. I'm not sure about that. Is it true??? If it's not, could you explain how import R package into sparkR? 回答1: I'm guessing that you may be referring to the includePackage command:

sparkR 1.6: How to predict probability when modeling with glm (binomial family)

阅读更多关于 sparkR 1.6: How to predict probability when modeling with glm (binomial family)

问题 I have just installed sparkR 1.6.1 on CentOS and am not using hadoop. My code to model data with discrete 'TARGET' values is as follows: # 'tr' is a R data frame with 104 numeric columns and one TARGET column # TARGET column is either 0 or 1 # Convert 'tr' to spark data frame train <- createDataFrame(sqlContext, tr) # test is an R dataframe without TARGET column # Convert 'test' to spark Data frame te<-createDataFrame(sqlContext,test) # Using sparkR's glm model to model data model <- glm

How to get connected to an existing session of Spark

阅读更多关于 How to get connected to an existing session of Spark

问题 I installed spark ( spark-2.1.0-bin-hadoop2.7 ) locally with success. Running spark from terminal was successful through the command below: $ spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/01/08 12:30:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where

SparkR from Rstudio - gives Error in invokeJava(isStatic = TRUE, className, methodName, …) :

阅读更多关于 SparkR from Rstudio - gives Error in invokeJava(isStatic = TRUE, className, methodName, …) :

I am using RStudio. After creating session if i try to create dataframe using R data it gives error. Sys.setenv(SPARK_HOME = "E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7") Sys.setenv(HADOOP_HOME = "E:/winutils") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) Sys.setenv('SPARKR_SUBMIT_ARGS'='"sparkr-shell"') library(SparkR) sparkR.session(sparkConfig = list(spark.sql.warehouse.dir="C:/Temp")) localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18)) df <- createDataFrame(localDF) ERROR : Error in invokeJava(isStatic = TRUE, className,