sparkr

Reading Text file in SparkR 1.4.0

匿名 (未验证) 提交于 2019-12-03 01:20:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: Does anyone know how to read a text file in SparkR version 1.4.0? Are there any Spark packages available for that? 回答1: Spark 1.6+ You can use text input format to read text file as a DataFrame : read . df ( sqlContext = sqlContext , source = "text" , path = "README.md" ) Spark <= 1.5 Short answer is you don't. SparkR 1.4 has been almost completely stripped from low level API, leaving only a limited subset of Data Frame operations. As you can read on an old SparkR webpage : As of April 2015, SparkR has been officially merged into

Reading csv data into SparkR after writing it out from a DataFrame

匿名 (未验证) 提交于 2019-12-03 01:08:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: I followed the example in this post to write out a DataFrame as a csv to an AWS S3 bucket. The result was not a single file but rather a folder with many .csv files. I'm now having trouble reading in this folder as a DataFrame in SparkR. Below is what I've tried but they do not result in the same DataFrame that I wrote out. write . df ( df , 's3a://bucket/df' , source = "csv" ) #Creates a folder named df in S3 bucket df_in1 <- read . df ( "s3a://bucket/df" , source = "csv" ) df_in2 <- read . df ( "s3a://bucket/df/*.csv" , source =

How to make a new DataFrame in sparkR

北慕城南 提交于 2019-12-02 17:35:46
问题 In sparkR I have data as a DataFrame. I can attach one entry in data like this: newdata <- filter(data, data$column == 1) How can I attach more than just one? Say I want to attach all elements in the vector list <- c(1,6,10,11,14) or if list is a DataFrame 1 6 10 11 14 . newdata <- filter(data, data$column == list) If I do it like this I get an error. 回答1: If you are ultimately trying to filter a spark DataFrame by a list of unique values, you can do this with a merge operation. If you are

Error while installing SparkR package using install_github

旧巷老猫 提交于 2019-12-02 17:23:05
问题 I am trying to use the SparkR package in R. I have all dependent packages like devtools , Rtools.exe , etc. When I try the following command: install_github("amplab-extras/SparkR-pkg",subdir="pkg") I get the following error: Downloading github repo amplab-extras/SparkR-pkg@master Error in function (type, msg, asError = TRUE ) : Received HTTP code 403 from proxy after CONNECT To solve this I have set a working http_proxy, https_proxy but it is not working and throws above error. Please guide

Error while installing SparkR package using install_github

六月ゝ 毕业季﹏ 提交于 2019-12-02 11:32:59
I am trying to use the SparkR package in R. I have all dependent packages like devtools , Rtools.exe , etc. When I try the following command: install_github("amplab-extras/SparkR-pkg",subdir="pkg") I get the following error: Downloading github repo amplab-extras/SparkR-pkg@master Error in function (type, msg, asError = TRUE ) : Received HTTP code 403 from proxy after CONNECT To solve this I have set a working http_proxy, https_proxy but it is not working and throws above error. Please guide as I am new to R/RStudio. I have installed SparkR on Windows 7, 64 bit with R-3.2.x and having Spark 1.4

if null replace with 0, otherwise default value in same column

允我心安 提交于 2019-12-02 11:11:02
问题 In SparkR shell 1.5.0, Created a sample data set: df_test <- createDataFrame(sqlContext, data.frame(mon = c(1,2,3,4,5), year = c(2011,2012,2013,2014,2015))) df_test1 <- createDataFrame(sqlContext, data.frame(mon1 = c(1,2,3,4,5,6,7,8))) df_test2 <- join(df_test1, df_test, joinExpr = df_test1$mon1 == df_test$mon, joinType = "left_outer") data set : df_test2 +----+----+------+ |mon1| mon| year| +----+----+------+ | 7.0|null| null| | 1.0| 1.0|2011.0| | 6.0|null| null| | 3.0| 3.0|2013.0| | 5.0| 5

Is it possible to use R package in sparkR? [closed]

此生再无相见时 提交于 2019-12-02 08:54:54
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I'm studying about sparkR, and I know there are so many useful R package in CRAN. But it seems like R package can't be used in sparkR. I'm not sure about that. Is it true??? If it's not, could you explain how import R package into sparkR? 回答1: I'm guessing that you may be referring to the includePackage command:

sparkR 1.6: How to predict probability when modeling with glm (binomial family)

隐身守侯 提交于 2019-12-02 08:06:18
问题 I have just installed sparkR 1.6.1 on CentOS and am not using hadoop. My code to model data with discrete 'TARGET' values is as follows: # 'tr' is a R data frame with 104 numeric columns and one TARGET column # TARGET column is either 0 or 1 # Convert 'tr' to spark data frame train <- createDataFrame(sqlContext, tr) # test is an R dataframe without TARGET column # Convert 'test' to spark Data frame te<-createDataFrame(sqlContext,test) # Using sparkR's glm model to model data model <- glm

How to get connected to an existing session of Spark

一世执手 提交于 2019-12-02 04:31:42
问题 I installed spark ( spark-2.1.0-bin-hadoop2.7 ) locally with success. Running spark from terminal was successful through the command below: $ spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/01/08 12:30:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where

SparkR from Rstudio - gives Error in invokeJava(isStatic = TRUE, className, methodName, …) :

限于喜欢 提交于 2019-12-02 02:54:49
I am using RStudio. After creating session if i try to create dataframe using R data it gives error. Sys.setenv(SPARK_HOME = "E:/spark-2.0.0-bin-hadoop2.7/spark-2.0.0-bin-hadoop2.7") Sys.setenv(HADOOP_HOME = "E:/winutils") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) Sys.setenv('SPARKR_SUBMIT_ARGS'='"sparkr-shell"') library(SparkR) sparkR.session(sparkConfig = list(spark.sql.warehouse.dir="C:/Temp")) localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18)) df <- createDataFrame(localDF) ERROR : Error in invokeJava(isStatic = TRUE, className,