dplyr | 易学教程

Factor Analysis using sparklyr in Databricks

阅读更多关于 Factor Analysis using sparklyr in Databricks

问题 I would like to perform a Factor Analysis by using dplyr::collect() in Databricks but because of its size I am getting this error: Error : org.apache.spark.sql.execution.OutOfMemorySparkException: Total memory usage during row decode exceeds spark.driver.maxResultSize (4.0 GB). The average row size was 82.0 B Is there a function in sparklyr using which I can do this analysis without collecting the data? 来源： https://stackoverflow.com/questions/64113459/factor-analysis-using-sparklyr-in

How to separate values in a column and convert to numeric values?

阅读更多关于 How to separate values in a column and convert to numeric values?

问题 I have a dataset where the values are collapsed so each row has multiple inputs per one column. For example: Gene Score1 Gene1 NA, NA, NA, 0.03, -0.3 Gene2 NA, 0.2, 0.1 I am trying to unpack this to then select the maximum absolute value per row for the Score1 column - and also keep track of if the maximum absolute value was previously negative by creating a new column. So output of the example is: Gene Score1 Negatives1 Gene1 0.3 1 Gene1 0.2 0 #Score1 is now the maximum absolute value and if

Select a maximum value across rows and columns with grouped data

阅读更多关于 Select a maximum value across rows and columns with grouped data

问题 The data below have an IndID field as well as three columns containing numbers, including NA in some instances, with a varying number of rows for each IndID . library(dplyr) n = 10 set.seed(123) dat <- data.frame(IndID = sample(c("AAA", "BBB", "CCC", "DDD"), n, replace = T), Num1 = c(2,4,2,4,4,1,3,4,3,2), Num2 = sample(c(1,2,5,8,7,8,NA), n, replace = T), Num3 = sample(c(NA, NA,NA,8,7,9,NA), n, replace = T)) %>% arrange(IndID) head(dat) IndID Num1 Num2 Num3 1 AAA 1 NA 7 2 BBB 2 NA NA 3 BBB 2 7

Select a maximum value across rows and columns with grouped data

阅读更多关于 Select a maximum value across rows and columns with grouped data

How can I create a new column based on conditional statements and dplyr?

阅读更多关于 How can I create a new column based on conditional statements and dplyr?

问题 x y 2 4 5 8 1 4 9 12 I have four conditions maxx = 3, minx = 1, maxy = 6, miny = 3. (If minx < x < maxx and miny < y < maxy, then z = apple) maxx = 6, minx = 4, maxy = 9, miny = 7. (If minx < x < maxx and miny < y < maxy, then z = ball) maxx = 2, minx = 0, maxy = 5, miny = 3. (If minx < x < maxx and miny < y < maxy, then z = pine) maxx = 12, minx = 7, maxy = 15, miny = 11. (If minx < x < maxx and miny < y < maxy, then z = orange) Expected outcome: x y z 2 4 apple 5 8 ball 1 4 pine 9 12 orange

Use dplyr´s filter and mutate to generate a new variable

阅读更多关于 Use dplyr´s filter and mutate to generate a new variable

问题 i choose the hflights-dataset as an example. I try to create a variable/column that contains the "TailNum" from the planes, but only for the planes that are under the 10% with the longest airtime. install.packages("hflights") library("hflights") flights <-tbl_df(hflights) flights %>% filter(cume_dist(desc(AirTime)) < 0.1) %>% mutate(new_var=TailNum) EDIT: The resulting dataframe has only 22208 obs instead of 227496. Is there a way to keep the original dataframe, but add a new variable with

Use dplyr´s filter and mutate to generate a new variable

阅读更多关于 Use dplyr´s filter and mutate to generate a new variable

How to recode dataframe values to keep only those that satisfy a certain set, replace others with “other”

阅读更多关于 How to recode dataframe values to keep only those that satisfy a certain set, replace others with “other”

问题 I'm looking for a concise solution, preferably using dplyr , to clean up values in a dataframe column so that I can keep as they are values that match a certain set, but others that don't match will be recoded as "other". Example I have a dataframe with names of animals. There are 4 legit animal names, but other rows contain gibberish rather than names. I want to clean the column up, to keep only the legit animal names: zebra , lion , cow , or cat . Data library(tidyverse) library(stringi)

R dplyr window function, get the first value in the next x window that fulfil some condition

阅读更多关于 R dplyr window function, get the first value in the next x window that fulfil some condition

问题 I have some dplyr dataframe and I have some condition. I want to know for each cell what is the index of the first cell that matches the condition in the next x rows. In my case, I want to have an additional column that holds the index of the first value that was larger than the current value in at least z. Example: here we are looking for the index of the first value in the next 3 rows that is larger by at least 3 from the current value. In the case of the first row, the value is 0 and the

R dplyr window function, get the first value in the next x window that fulfil some condition

阅读更多关于 R dplyr window function, get the first value in the next x window that fulfil some condition