data-manipulation | 易学教程

Average of subgroup of 2nd column, grouped by 1st column

阅读更多关于 Average of subgroup of 2nd column, grouped by 1st column

问题 Suppose I have matrix A. 1st column is "group". Then I want to calculate the average of 2nd column for each group. So I want to create B. A= 1 2 1 3 2 4 2 2 B= 1 2.5 2 3 The best thing I did until now is to construct a long for and if loop and use average function to get to B. But I guess there will be more simple method. Is there? 回答1: I hadn't used accumarray before, so due to the comment by @Dan I decided to give it a try. At first I tried a naive version and used histc to count

How to filter (with dplyr) for all values of a group if variable limit is reached?

阅读更多关于 How to filter (with dplyr) for all values of a group if variable limit is reached?

问题 Here's the dummy data: cases <- rep(1:5,times=2) var1 <- as.numeric(c(450,100,250,999,200,500,980,10,700,1000)) var2 <- as.numeric(c(111,222,333,444,424,634,915,12,105,152)) maindata1 <- data.frame(cases,var1,var2) df1 <- maindata1 %>% filter(var1 >950) %>% distinct(cases) %>% select(cases) table1 <- maindata1 %>% filter(cases == 2 | cases == 4 | cases == 5) %>% arrange(cases) > table1 cases var1 var2 1 2 100 222 2 2 980 915 3 4 999 444 4 4 700 105 5 5 200 424 6 5 1000 152 I'm trying to

r data.frame create new variable

阅读更多关于 r data.frame create new variable

问题 I have a dataframe with around 1.5 million rows and 5 cols. One variable (VARIABLE) is of this type NATIONALITY_YEAR (e.g. SPAIN_1998) and I want to split it in two columns, one containing the Nationality, which is the left side of the name before the underscore, and one containing the Year, right side of the underscore. I have tried with concat.split which should be the easiest way: aa <- concat.split(mydata, "VARIABLE", sep = "_", drop = F) but after 2 hours running it did not produce any

Finding maximum value of one column (by group) and inserting value into another data frame in R

阅读更多关于 Finding maximum value of one column (by group) and inserting value into another data frame in R

All, I was hoping someone could find a solution to an issue of mine that isn't necessarily causing headaches, but, as of right now, invites the possibility for human error in creating a data set for a project on which I'm working. The data set I'm using right now is a directed dyad-year (A vs. B, B vs. A) data set for select pairs of countries for every year between 1950 and 2010. Some countries, like A in my example, will be paired with every country in the world and every country will be paired with it. Some countries, like B and C in my example, will be paired with just a few countries.

Assign value to group based on condition in column

阅读更多关于 Assign value to group based on condition in column

I have a data frame that looks like the following: > df = data.frame(group = c(1,1,1,2,2,2,3,3,3), date = c(1,2,3,4,5,6,7,8,9), value = c(3,4,3,4,5,6,6,4,9)) > df group date value 1 1 1 3 2 1 2 4 3 1 3 3 4 2 4 4 5 2 5 5 6 2 6 6 7 3 7 6 8 3 8 4 9 3 9 9 I want to create a new column that contains the date value per group that is associated with the value "4" from the value column. The following data frame shows what I hope to accomplish. group date value newValue 1 1 1 3 2 2 1 2 4 2 3 1 3 3 2 4 2 4 4 4 5 2 5 5 4 6 2 6 6 4 7 3 7 6 8 8 3 8 4 8 9 3 9 9 8 As we can see, group 1 has the newValue "2"

remove row with nan value

阅读更多关于 remove row with nan value

问题 let's say, for example, i have this data: data <- c(1,2,3,4,5,6,NaN,5,9,NaN,23,9) attr(data,"dim") <- c(6,2) data [,1] [,2] [1,] 1 NaN [2,] 2 5 [3,] 3 9 [4,] 4 NaN [5,] 5 23 [6,] 6 9 Now i want to remove the rows with the NaN values in it: row 1 and 4. But i don't know where these rows are, if it's a dataset of 100.000+ rows, so i need to find them with a function and remove the complete row. Can anybody point me in the right direction? 回答1: The function complete.cases will tell you where the

Sliding time intervals for time series data in R

阅读更多关于 Sliding time intervals for time series data in R

问题 I am trying to extract interesting statistics for an irregular time series data set, but coming up short on finding the right tools for the job. The tools for manipulating regularly sampled time series or index-based series of any time are pretty easily found, though I'm not having much luck with the problems I'm trying to solve. First, a reproducible data set: library(zoo) set.seed(0) nSamples <- 5000 vecDT <- rexp(nSamples, 3) vecTimes <- cumsum(c(0,vecDT)) vecDrift <- c(0, rnorm(nSamples,

Converting String Array to an Integer Array

阅读更多关于 Converting String Array to an Integer Array

so basically user enters a sequence from an scanner input. 12, 3, 4 , etc. It can be of any length long and it has to be integers. I want to convert the string input to an integer array. so int[0] would be 12 , int[1] would be 3 , etc. Any tips and ideas? I was thinking of implementing if charat(i) == ',' get the previous number(s) and parse them together and apply it to the current available slot in the array. But I'm not quite sure how to code that. Java Devil You could read the entire input line from scanner, then split the line by , then you have a String[] , parse each number into int[]

Categorizing variabels in SAS using a range system

阅读更多关于 Categorizing variabels in SAS using a range system

问题 I have the numeric values of salaries of different employee's. I want to break the ranges up into categories. However I do not want a new column rather, I want to just format the existing salary column into this range method: At least $20,000 but less than $100,000 - At least $100,000 and up to $500,000 - >$100,000 Missing - Missing salary Any other value - Invalid salary I've done something similar with gender. I just want to use the proc print and format command to show salary and gender.

Extract non null elements from a list in R

阅读更多关于 Extract non null elements from a list in R

问题 I have a list like this: x = list(a = 1:4, b = 3:10, c = NULL) x #$a #[1] 1 2 3 4 # #$b #[1] 3 4 5 6 7 8 9 10 # #$c #NULL and I want to extract all elements that are not null. How can this be done? Thanks. 回答1: Here's another option: Filter(Negate(is.null), x) 回答2: What about: x[!unlist(lapply(x, is.null))] Here is a brief description of what is going on. lapply tells us which elements are NULL R> lapply(x, is.null) $a [1] FALSE $b [1] FALSE $c [1] TRUE Next we convect the list into a vector: