apply

R nested map through columns

前提是你 提交于 2021-02-19 06:18:08
问题 I got a function which was solved here. This function takes a column filled with annotations and another grouping column and propagates the annotation to rows with missing values. f1 <- function(data, group_col, expand_col){ data %>% dplyr::group_by({{group_col}}) %>% dplyr::mutate( {{expand_col}} := dplyr::case_when( !is.na({{expand_col}}) ~ {{expand_col}} , any( !is.na({{expand_col}}) ) & is.na({{expand_col}}) ~ paste(unique(unlist(str_split(na.omit({{expand_col}}), " ")) ), collapse = " ")

R - Add columns to dataframes in list by looping through elements in a vector

谁说胖子不能爱 提交于 2021-02-19 04:14:54
问题 I am working with several datasets that measure the same variables over many years. I am trying to add a year variable to each dataset, but more generally I want to loop through elements in a vector and add each as a new column in a list of dataframes. This question was similar to mine but I want to iteratively add each element in a vector to the corresponding dataframe as a new column: R - New variables over several data frames in a loop Here's sample data: year <- c(1:3) data1 <- data.frame

R - Add columns to dataframes in list by looping through elements in a vector

拟墨画扇 提交于 2021-02-19 04:13:50
问题 I am working with several datasets that measure the same variables over many years. I am trying to add a year variable to each dataset, but more generally I want to loop through elements in a vector and add each as a new column in a list of dataframes. This question was similar to mine but I want to iteratively add each element in a vector to the corresponding dataframe as a new column: R - New variables over several data frames in a loop Here's sample data: year <- c(1:3) data1 <- data.frame

Euclidean Distances between rows of two data frames in R

做~自己de王妃 提交于 2021-02-17 03:29:36
问题 Calculating Euclidean Distances in R is easy. A good example can be found HERE. The vectorised form is: sqrt((known_data[, 1] - unknown_data[, 1])^2 + (known_data[, 2] - unknown_data[, 2])^2) What would be the fastest, most efficient way to get Euclidean Distances for each row of one data frame with all rows of another data frame? A particular function from apply() family? Thanks! 回答1: Maybe you can try outer + dist like below outer( 1:nrow(known_data), 1:nrow(unknown_data), FUN = Vectorize

How to replace NA values in a data.table with na.spline

末鹿安然 提交于 2021-02-16 20:09:06
问题 I'm trying to prepare some demographic data retrieved from Eurostat for further processing, amongst others replacing any missing data with corresponding approximated ones. First I was using data.frames only, but then I got convinced that data.tables might offer some advantages over regular data.frames, so I migrated to data.tables. One thing I've observed while doing so was getting different results when using "na.spline" in combination with "apply" versus "na.spline" as part of the data

applying same function on multiple files in R

谁都会走 提交于 2021-02-15 02:55:58
问题 I am new to R program and currently working on a set of financial data. Now I got around 10 csv files under my working directory and I want to analyze one of them and apply the same command to the rest of csv files. Here are all the names of these files: ("US%10y.csv", "UK%10y.csv", "GER%10y.csv","JAP%10y.csv", "CHI%10y.csv", "SWI%10y.csv","SOA%10y.csv", "BRA%10y.csv", "CAN%10y.csv", "AUS%10y.csv") For example, because the Date column in CSV files are Factor so I need to change them to Date

Parallelizing comparisons between two dataframes with multiprocessing

半世苍凉 提交于 2021-02-10 15:57:06
问题 I've got the following function that allows me to do some comparison between the rows of two dataframes ( data and ref )and return the index of both rows if there's a match. def get_gene(row): m = np.equal(row[0], ref.iloc[:,0].values) & np.greater_equal(row[2], ref.iloc[:,2].values) & np.less_equal(row[3], ref.iloc[:,3].values) return ref.index[m] if m.any() else None Being a process that takes time (25min for 1.6M rows in data versus 20K rows in ref ), I tried to speed things up by

Parallelizing comparisons between two dataframes with multiprocessing

做~自己de王妃 提交于 2021-02-10 15:57:04
问题 I've got the following function that allows me to do some comparison between the rows of two dataframes ( data and ref )and return the index of both rows if there's a match. def get_gene(row): m = np.equal(row[0], ref.iloc[:,0].values) & np.greater_equal(row[2], ref.iloc[:,2].values) & np.less_equal(row[3], ref.iloc[:,3].values) return ref.index[m] if m.any() else None Being a process that takes time (25min for 1.6M rows in data versus 20K rows in ref ), I tried to speed things up by

Test whether any input in a set of numbered input objects in R Shiny is empty

爱⌒轻易说出口 提交于 2021-02-08 02:09:27
问题 Let's say I have created 10 selectInput dropdowns for a multi plot export and these selectInputs are called "xaxis_1", "xaxis_2", ..... , "xaxis_10" for a single 1 I can write: if(!is.null(input$xaxis_1)) { .... do stuff } to stop it running export when the user hasn't entered any name, and presses submit, to avoid crashes. A bit more general you can check this: if(!is.null(input[[paste('xaxis', i, sep = '_')]])) { ...} how can you write it elegantly so that 1 line of code checks whether ANY

Test whether any input in a set of numbered input objects in R Shiny is empty

怎甘沉沦 提交于 2021-02-08 02:08:11
问题 Let's say I have created 10 selectInput dropdowns for a multi plot export and these selectInputs are called "xaxis_1", "xaxis_2", ..... , "xaxis_10" for a single 1 I can write: if(!is.null(input$xaxis_1)) { .... do stuff } to stop it running export when the user hasn't entered any name, and presses submit, to avoid crashes. A bit more general you can check this: if(!is.null(input[[paste('xaxis', i, sep = '_')]])) { ...} how can you write it elegantly so that 1 line of code checks whether ANY