r

Can you use Athena ODBC/JDBC to return the S3 location of results?

▼魔方 西西 提交于 2021-02-11 14:06:01
问题 I've been using the metis package to run Athena queries via R. While this is great for small queries, there still does not seem to be a viable solution for queries with very large return datasets (10's of thousands of rows, for example). However, when running these same queries in the AWS console, it is fast/straightforward to use the download link to obtain the CSV file of the query result. This got me thinking: is there a mechanism for sending the query via R but returning/obtaining the S3:

Coloring the points by category in R

青春壹個敷衍的年華 提交于 2021-02-11 14:05:35
问题 I am creating a scatter plot in R using the following code: plot(df_prob1$x1, df_prob1$x2, pch = df_prob1$y) I get the following plot: As seen in the above plot there are two categories, one represented by a square and the other by circle. I want these two categories to have different colors as well. I did try using the following code: plot(df_prob1$x1, df_prob1$x2, pch = df_prob1$y, col = c("red", "blue")) And I get the following plot: However, it is randomly coloring points and not taking

How to load a git branch from another R package

烈酒焚心 提交于 2021-02-11 14:02:30
问题 In R, how I load one package's git branch from another package? There are two packages, call them producer and consumer1 . I am refactoring my code by moving a bunch of function definitions and tests from producer to consumer1 . I'm creating git branches, rfctrProd and rfctrCons1 for producer and consumer1 . In rfctrCons1 , I need a statement doing something like #` @import producer, gitBranch = rfctrProd Also, I'll to do similarly with other packages which import producer , to make sure I

R, inconsistent date format

試著忘記壹切 提交于 2021-02-11 14:01:11
问题 I have a date variable, which originally comes from an excel. However, it is so heterogeneous. Even though all look like yyyy/mm/dd in the excel, when read in R, the variable look like: person_1 39257 person_2 2015/2/20 person_3 NA How to clean up the date variable so that every and each shows yyyy/mm/dd format? 回答1: Or an option with anydate and excel_numeric_to_date library(janitor) library(anytime) library(dplyr) coalesce( excel_numeric_to_date(as.numeric(dat$V2)), anydate(dat$V2)) #[1]

How do I create a function that defines a moving threshold along local maxima in R?

旧城冷巷雨未停 提交于 2021-02-11 14:01:01
问题 The goal is to quantify a certain growth. The definition is as follows: Every value in the sequence shall be compared to the preceding value and if the following value is greater than the preceding one, it shall be taken into regard (returned). If not, it shall be dropped. Consequently, the greater value is used as a new reference for the following ones. A threshold that moves with the ascending values. I've tried this: growthdata<-c(21679, 21722, 21788, 21863, 21833, 21818, 21809, 21834,

How to make a plot in r with multiple lines using ggplot

早过忘川 提交于 2021-02-11 13:59:54
问题 I am trying to do a graph in r with 3 lines using ggplot, but the third line does not appear in the graph. I used the following code: us_idlpnts <- subset(unvoting, CountryName == "United States of America") rus_idlpnts <- subset(unvoting, CountryName == "Russia") mdn_idl_pnt <- summarize(unvoting, PctAgreeUS = median(PctAgreeUS, na.rm=T), PctAgreeRUSSIA = median(PctAgreeRUSSIA, na.rm=T), idealpoint = median(idealpoint, na.rm=T), Year = median(Year, na.rm= T)) ggplot(NULL, aes(Year,

How to make a plot in r with multiple lines using ggplot

流过昼夜 提交于 2021-02-11 13:58:53
问题 I am trying to do a graph in r with 3 lines using ggplot, but the third line does not appear in the graph. I used the following code: us_idlpnts <- subset(unvoting, CountryName == "United States of America") rus_idlpnts <- subset(unvoting, CountryName == "Russia") mdn_idl_pnt <- summarize(unvoting, PctAgreeUS = median(PctAgreeUS, na.rm=T), PctAgreeRUSSIA = median(PctAgreeRUSSIA, na.rm=T), idealpoint = median(idealpoint, na.rm=T), Year = median(Year, na.rm= T)) ggplot(NULL, aes(Year,

Regular Expression R: Select the above or below lines of a regexp selection while meeting another regexp criteria

无人久伴 提交于 2021-02-11 13:58:28
问题 I am working with a text document similar to the examples below. File <- c("Location Name Code and Label Frequency Percentage", " During the past 30 days, on how many days did you carry a weapon", "44-44 Q13 such as a gun, knife, or club on school property?", " 1 0 days 1,610 94.5", " 2 1 day 71 4.3", " 3 2 or 3 days 6 0.4", " 4 4 or 5 days 3 0.2", " 5 6 or more days 12 0.7", " Missing 48", "45-45 Q14 During the past 12 months, on how many days did you carry a gun?", " 1 0 days 1,602 91.3", "

Regular Expression R: Select the above or below lines of a regexp selection while meeting another regexp criteria

岁酱吖の 提交于 2021-02-11 13:57:24
问题 I am working with a text document similar to the examples below. File <- c("Location Name Code and Label Frequency Percentage", " During the past 30 days, on how many days did you carry a weapon", "44-44 Q13 such as a gun, knife, or club on school property?", " 1 0 days 1,610 94.5", " 2 1 day 71 4.3", " 3 2 or 3 days 6 0.4", " 4 4 or 5 days 3 0.2", " 5 6 or more days 12 0.7", " Missing 48", "45-45 Q14 During the past 12 months, on how many days did you carry a gun?", " 1 0 days 1,602 91.3", "

How to use a loop to delete all rows with negative values in R

大城市里の小女人 提交于 2021-02-11 13:56:26
问题 I am new to loops. I have an unwieldy data frame that I want to cut down so that only observations (rows) without negative numbers remain. Here is where I'm stuck. This creates a null value every time instead of a trimmed down data frame. mydata=for (i in names(df)) { subset(df, df[[ paste(i)]]>=0) } 回答1: How about a purely vectorised solution: DF[!rowSums(DF < 0), ] # ID Items Sequence #1 1 D 1 #2 1 A 2 #5 2 B 2 Data DF=structure(list(ID = c(1, 1, 1, -1, 2), Items = c("D", "A", "A", "A", "B"