data-manipulation | 易学教程

How to separate one column to multiple column (complex column)

阅读更多关于 How to separate one column to multiple column (complex column)

I am trying to separate column "Grade" to multiple columns according to their subject and grade grade<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/High_school_Grade.csv",sep=";") # Rename the column names names(grade)<-c("Student_ID","Name","Venue","Grade") head(grade) # Separate `Grade` into `subject` variables and coresponding `Grade`columns library(tidyverse) df<- grade %>% separate(Grade,paste("V",1:7,sep="_"),sep=":") head(df) # It still is not separating `subject ` and `grade` independently # Here is what I want it to look like new_df<-df[c(1:5),c(1:4)] new

Replacing NAs in a column with the values of other column

阅读更多关于 Replacing NAs in a column with the values of other column

问题 I wonder how to replace NA s in a column with the values of other column in R using dplyr . MWE is below. Letters <- LETTERS[1:5] Char <- c("a", "b", NA, "d", NA) df1 <- data.frame(Letters, Char) df1 library(dplyr] df1 %>% mutate(Char1 = ifelse(Char != NA, Char, Letters)) Letters Char Char1 1 A a NA 2 B b NA 3 C <NA> NA 4 D d NA 5 E <NA> NA 回答1: You can use coalesce : library(dplyr) df1 <- data.frame(Letters, Char, stringsAsFactors = F) df1 %>% mutate(Char1 = coalesce(Char, Letters)) Letters

Replace single quote with double quote in a column in R

阅读更多关于 Replace single quote with double quote in a column in R

问题 My dataframe in R has a column A where I have string data with single quote in it. Column A 'Hello World' 'Hi World' 'Good morning world' What I would like to do is to replace the single quote with double quotes and achieve the output like below. Column A "Hello World" "Hi World" "Good morning world Can this be achieved? Thank you in advance for reading. 回答1: Try this: "iris" is a sample data frame and I am trying to replace single quotes of "Species" column. Since ' and " are special

How to shift each row of a matrix in R

阅读更多关于 How to shift each row of a matrix in R

问题 I have a matrix of this form: a b c d e 0 f 0 0 and I want to transform it into something like this: a b c 0 d e 0 0 f The shifting pattern is this: shift by 0 for row 1 shift by 1 for row 2 shift by 2 for row 3 ... shift by n-1 for row n This can be done with a for loop of course. I am wondering if there is a better way? 回答1: Assuming that your example is representative, i.e., you have always a triangle structure of letters and zeros: mat <- structure(c("a", "d", "f", "b", "e", "0", "c", "0"

get first and last values in group – dplyr group_by with last() and first()

阅读更多关于 get first and last values in group – dplyr group_by with last() and first()

问题 The code below should group the data by year and then create two new columns with the first and last value of each year. library(dplyr) set.seed(123) d <- data.frame( group = rep(1:3, each = 3), year = rep(seq(2000,2002,1),3), value = sample(1:9, r = T)) d %>% group_by(group) %>% mutate( first = dplyr::first(value), last = dplyr::last(value) ) However, it does not work as it should. The expected result would be group year value first last <int> <dbl> <int> <int> <int> 1 1 2000 3 3 4 2 1 2001

Passing strings as arguments in dplyr verbs

阅读更多关于 Passing strings as arguments in dplyr verbs

I would like to be able to define arguments for dplyr verbs condition <- "dist > 50" and then use these strings in dplyr functions : require(ggplot2) ds <- cars ds1 <- ds %>% filter (eval(condition)) ds1 But it throws in error Error: filter condition does not evaluate to a logical vector. The code should evaluate as: ds1<- ds %>% filter(dist > 50) ds1 Resulting in : ds1 speed dist 1 14 60 2 14 80 3 15 54 4 18 56 5 18 76 6 18 84 7 19 68 8 20 52 9 20 56 10 20 64 11 22 66 12 23 54 13 24 70 14 24 92 15 24 93 16 24 120 17 25 85 Question: How to pass a string as an argument in a dplyr verb? Since

Writing a generic function for “find and replace” in R

阅读更多关于 Writing a generic function for “find and replace” in R

I need to write a generic function for "find and replace in R". How can I write a function that takes the following inputs A CSV file (or data frame) A string to find, for example "name@email.com" A string the replace the found string with, for example "medium" and rewrites the CSV file/data frame so that all the found strings are replaced with the replacement string? Here's a quick function to do the job: library(stringr) replace_all <- function(df, pattern, replacement) { char <- vapply(df, function(x) is.factor(x) || is.character(x), logical(1)) df[char] <- lapply(df[char], str_replace_all,

How to get next number in sequence in R

阅读更多关于 How to get next number in sequence in R

问题 I need to automate the process of getting the next number(s) in the given sequence. Can we make a function which takes two inputs a vector of numbers(3,7,13,21 e.g.) how many next numbers seqNext <- function(sequ, next) { .. } seqNext( c(3,7,13,21), 3) # 31 43 57 seqNext( c(37,26,17,10), 1) # 5 回答1: By the power of maths! x1 <- c(3,7,13,21) dat <- data.frame(x=seq_along(x1), y=x1) predict(lm(y ~ poly(x, 2), data=dat), newdata=list(x=5:15)) # 1 2 3 4 5 6 7 8 9 10 11 # 31 43 57 73 91 111 133

pandas reset_index after groupby.value_counts()

阅读更多关于 pandas reset_index after groupby.value_counts()

问题 I am trying to groupby a column and compute value counts on another column. import pandas as pd dftest = pd.DataFrame({'A':[1,1,1,1,1,1,1,1,1,2,2,2,2,2], 'Amt':[20,20,20,30,30,30,30,40, 40,10, 10, 40,40,40]}) print(dftest) dftest looks like A Amt 0 1 20 1 1 20 2 1 20 3 1 30 4 1 30 5 1 30 6 1 30 7 1 40 8 1 40 9 2 10 10 2 10 11 2 40 12 2 40 13 2 40 perform grouping grouper = dftest.groupby('A') df_grouped = grouper['Amt'].value_counts() which gives A Amt 1 30 4 20 3 40 2 2 40 3 10 2 Name: Amt,

Generating a moving sum variable in R

阅读更多关于 Generating a moving sum variable in R

I suspect this is a somewhat simple question with multiple solutions, but I'm still a bit of a novice in R and an exhaustive search didn't yield answers that spoke well to what I'm wanting to do. I'm trying to create, for lack of better term, "moving sums" for a variable in my data frame. These would be 3-year and 5-year sums, lagged one year. So, a 5-year sum for an observation in 1986 would be the sum of all previous observations in 1981, 1982, 1983, 1984, and 1985. Here is an example of what I would like to do, where the sum variable is the sum of all x in the five years prior to the