data-manipulation

How to separate one column to multiple column (complex column)

主宰稳场 提交于 2019-12-04 12:55:54
I am trying to separate column "Grade" to multiple columns according to their subject and grade grade<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/High_school_Grade.csv",sep=";") # Rename the column names names(grade)<-c("Student_ID","Name","Venue","Grade") head(grade) # Separate `Grade` into `subject` variables and coresponding `Grade`columns library(tidyverse) df<- grade %>% separate(Grade,paste("V",1:7,sep="_"),sep=":") head(df) # It still is not separating `subject ` and `grade` independently # Here is what I want it to look like new_df<-df[c(1:5),c(1:4)] new

Replacing NAs in a column with the values of other column

妖精的绣舞 提交于 2019-12-04 11:02:01
问题 I wonder how to replace NA s in a column with the values of other column in R using dplyr . MWE is below. Letters <- LETTERS[1:5] Char <- c("a", "b", NA, "d", NA) df1 <- data.frame(Letters, Char) df1 library(dplyr] df1 %>% mutate(Char1 = ifelse(Char != NA, Char, Letters)) Letters Char Char1 1 A a NA 2 B b NA 3 C <NA> NA 4 D d NA 5 E <NA> NA 回答1: You can use coalesce : library(dplyr) df1 <- data.frame(Letters, Char, stringsAsFactors = F) df1 %>% mutate(Char1 = coalesce(Char, Letters)) Letters

Replace single quote with double quote in a column in R

筅森魡賤 提交于 2019-12-04 07:08:33
问题 My dataframe in R has a column A where I have string data with single quote in it. Column A 'Hello World' 'Hi World' 'Good morning world' What I would like to do is to replace the single quote with double quotes and achieve the output like below. Column A "Hello World" "Hi World" "Good morning world Can this be achieved? Thank you in advance for reading. 回答1: Try this: "iris" is a sample data frame and I am trying to replace single quotes of "Species" column. Since ' and " are special

How to shift each row of a matrix in R

馋奶兔 提交于 2019-12-04 04:37:22
问题 I have a matrix of this form: a b c d e 0 f 0 0 and I want to transform it into something like this: a b c 0 d e 0 0 f The shifting pattern is this: shift by 0 for row 1 shift by 1 for row 2 shift by 2 for row 3 ... shift by n-1 for row n This can be done with a for loop of course. I am wondering if there is a better way? 回答1: Assuming that your example is representative, i.e., you have always a triangle structure of letters and zeros: mat <- structure(c("a", "d", "f", "b", "e", "0", "c", "0"

get first and last values in group – dplyr group_by with last() and first()

梦想与她 提交于 2019-12-04 03:11:39
问题 The code below should group the data by year and then create two new columns with the first and last value of each year. library(dplyr) set.seed(123) d <- data.frame( group = rep(1:3, each = 3), year = rep(seq(2000,2002,1),3), value = sample(1:9, r = T)) d %>% group_by(group) %>% mutate( first = dplyr::first(value), last = dplyr::last(value) ) However, it does not work as it should. The expected result would be group year value first last <int> <dbl> <int> <int> <int> 1 1 2000 3 3 4 2 1 2001

Passing strings as arguments in dplyr verbs

妖精的绣舞 提交于 2019-12-04 00:37:35
I would like to be able to define arguments for dplyr verbs condition <- "dist > 50" and then use these strings in dplyr functions : require(ggplot2) ds <- cars ds1 <- ds %>% filter (eval(condition)) ds1 But it throws in error Error: filter condition does not evaluate to a logical vector. The code should evaluate as: ds1<- ds %>% filter(dist > 50) ds1 Resulting in : ds1 speed dist 1 14 60 2 14 80 3 15 54 4 18 56 5 18 76 6 18 84 7 19 68 8 20 52 9 20 56 10 20 64 11 22 66 12 23 54 13 24 70 14 24 92 15 24 93 16 24 120 17 25 85 Question: How to pass a string as an argument in a dplyr verb? Since

Writing a generic function for “find and replace” in R

时光总嘲笑我的痴心妄想 提交于 2019-12-03 21:51:39
I need to write a generic function for "find and replace in R". How can I write a function that takes the following inputs A CSV file (or data frame) A string to find, for example "name@email.com" A string the replace the found string with, for example "medium" and rewrites the CSV file/data frame so that all the found strings are replaced with the replacement string? Here's a quick function to do the job: library(stringr) replace_all <- function(df, pattern, replacement) { char <- vapply(df, function(x) is.factor(x) || is.character(x), logical(1)) df[char] <- lapply(df[char], str_replace_all,

How to get next number in sequence in R

自古美人都是妖i 提交于 2019-12-03 19:42:41
问题 I need to automate the process of getting the next number(s) in the given sequence. Can we make a function which takes two inputs a vector of numbers(3,7,13,21 e.g.) how many next numbers seqNext <- function(sequ, next) { .. } seqNext( c(3,7,13,21), 3) # 31 43 57 seqNext( c(37,26,17,10), 1) # 5 回答1: By the power of maths! x1 <- c(3,7,13,21) dat <- data.frame(x=seq_along(x1), y=x1) predict(lm(y ~ poly(x, 2), data=dat), newdata=list(x=5:15)) # 1 2 3 4 5 6 7 8 9 10 11 # 31 43 57 73 91 111 133

pandas reset_index after groupby.value_counts()

杀马特。学长 韩版系。学妹 提交于 2019-12-03 08:31:53
问题 I am trying to groupby a column and compute value counts on another column. import pandas as pd dftest = pd.DataFrame({'A':[1,1,1,1,1,1,1,1,1,2,2,2,2,2], 'Amt':[20,20,20,30,30,30,30,40, 40,10, 10, 40,40,40]}) print(dftest) dftest looks like A Amt 0 1 20 1 1 20 2 1 20 3 1 30 4 1 30 5 1 30 6 1 30 7 1 40 8 1 40 9 2 10 10 2 10 11 2 40 12 2 40 13 2 40 perform grouping grouper = dftest.groupby('A') df_grouped = grouper['Amt'].value_counts() which gives A Amt 1 30 4 20 3 40 2 2 40 3 10 2 Name: Amt,

Generating a moving sum variable in R

雨燕双飞 提交于 2019-12-03 07:50:48
I suspect this is a somewhat simple question with multiple solutions, but I'm still a bit of a novice in R and an exhaustive search didn't yield answers that spoke well to what I'm wanting to do. I'm trying to create, for lack of better term, "moving sums" for a variable in my data frame. These would be 3-year and 5-year sums, lagged one year. So, a 5-year sum for an observation in 1986 would be the sum of all previous observations in 1981, 1982, 1983, 1984, and 1985. Here is an example of what I would like to do, where the sum variable is the sum of all x in the five years prior to the