dplyr | 易学教程

Using ifelse statement for multiple values in a column

阅读更多关于 Using ifelse statement for multiple values in a column

问题 I have a table with approximately 3000 rows with data in the form of : Number Type 10001 0 10005 7 10006 0 10007 14 10012 16 10022 14 10023 0 10024 0 10029 7 10035 17 10045 14 I want to add a third column so that the table looks like : Number Type SCHEach 10001 0 0 10005 7 0 10006 0 0 10007 14 0 10012 16 1 10022 14 0 10023 0 0 10024 0 0 10029 7 0 10035 17 1 10045 14 0 where values in the SCHEach column are based on values in the Type column. If values in the Type column are 16,17,21, or 22,

Coalesce columns and create another column to specify source

阅读更多关于 Coalesce columns and create another column to specify source

问题 I'm using dplyr::coalesce() to combine several columns into one. Originally, across columns, each row has only one column with actual value while the other columns are NA . Based on the coalescing, I want to create an additional column that will specify the source column from which the coalesced value was taken from. My attempt is inspired by existing functionality in other dplyr functions. For example, dplyr::bind_rows() has .id argument that specifies the source dataframe for each row in

Ignoring NA when summing multiple columns with dplyr

阅读更多关于 Ignoring NA when summing multiple columns with dplyr

问题 I am summing across multiple columns, some that have NA. I am using dplyr::mutate and then writing out the arithmetic sum of the columns to get the sum. But the columns have NA and I would like to treat them as zero. I was able to get it to work with rowSums (see below), but now using mutate. Using mutate allows to make it more readable, but can also allow me to subtract columns. The example is below. require(dplyr) data(iris) iris <- tbl_df(iris) iris[2,3] <- NA iris <- mutate(iris, sum =

Using cut() with group_by()

阅读更多关于 Using cut() with group_by()

问题 I am trying to bin a continuous variable into intervals, varying the cut value based on the group of the observation. There has been a similar question asked previously, but it only dealt with a single column, while I was wanting to find a solution which could be generalised to work with he group_by() function in dplyr , which allows multiple columns to be selected for the grouping. Here is a basic example dataset: df <- data.frame(group = c(rep("Group 1", 10), rep("Group 2", 10)), subgroup =

Renaming multiple columns with dplyr rename(across(

阅读更多关于 Renaming multiple columns with dplyr rename(across(

问题 Hey i'm trying to rename some columsn by adding "Last_" with the new version of dplyr but I keep getting this error Error: `across()` must only be used inside dplyr verbs. this is my code data %>% rename(across(everything(), ~paste0("Last_", .))) dplyr version: v1.0.2 回答1: We can use rename_with instead of rename library(dplyr) library(stringr) data %>% rename_with(~str_c("Last_", .), everything()) Reproducible example data(iris) head(iris) %>% rename_with(~str_c("Last_", .), .cols =

Rolling multiple regression panel data

阅读更多关于 Rolling multiple regression panel data

问题 I am trying to perform a rolling regression for time t over the last 36 months for companies with observations for 18 of these months, but I am not able to make the function work. I only want the coefficient for var1. X, y, z are control variables. Here is a sample of the data and the code I am trying to run. structure(list(Year = c(2018, 2014, 2008, 2004, 2005, 2002, 2010, 2008, 2013, 1998), Month = c(6, 12, 4, 6, 4, 8, 12, 11, 3, 3), ISIN = c("NO0004895103", "NO0010571680", "NO0010010473",

summary stats across columns, where column names indicate groups

阅读更多关于 summary stats across columns, where column names indicate groups

问题 Data frame have includes a few thousand vectors that follow a naming pattern. Each vector name includes a noun, then either _a , _b , or _c . Below are the first 10 vars and obs: id turtle_a banana_a castle_a turtle_b banana_b castle_b turtle_c banana_c castle_c A -0.58 -0.88 -0.56 -0.53 -0.32 -0.42 -0.52 -0.89 -0.72 B NA NA NA -0.84 -0.36 -0.26 NA NA NA C 0.00 -0.43 -0.75 -0.35 -0.88 -0.14 -0.26 -0.15 -0.81 D -0.81 -0.63 -0.77 -0.82 -0.83 -0.50 -0.77 -0.25 -0.07 E -0.25 -0.33 -0.09 -0.51 -0

R reshape name value pairs from wide to long using pivot_longer

阅读更多关于 R reshape name value pairs from wide to long using pivot_longer

问题 I am trying to figure out how to reshape a dataset of the names of political parties from wide to long using dplyr and pivot_longer . For each Party_ID , there is a number of constant columns attached (Party_Name_Short, Party_Name, Country, Party_in_orig_title) and a number of time changing factors as well: election, Date, Rename, Reason, Party_Title, alliance, member_parties, split, parent_party, merger, child_party, successor, predecessor . The time changing factors were recorded up to 11

Is there an R dplyr method for merge with all=TRUE?

阅读更多关于 Is there an R dplyr method for merge with all=TRUE?

问题 I have two R dataframes I want to merge. In straight R you can do: cost <- data.frame(farm=c('farm A', 'office'), cost=c(10, 100)) trees <- data.frame(farm=c('farm A', 'farm B'), trees=c(20,30)) merge(cost, trees, all=TRUE) which produces: farm cost trees 1 farm A 10 20 2 office 100 NA 3 farm B NA 30 I am using dplyr , and would prefer a solution such as: left_join(cost, trees) which produces something close to what I want: farm cost trees 1 farm A 10 20 2 office 100 NA In dplyr I can see

Is there an R dplyr method for merge with all=TRUE?

阅读更多关于 Is there an R dplyr method for merge with all=TRUE?