data-manipulation

Average of subgroup of 2nd column, grouped by 1st column

北城以北 提交于 2019-11-28 11:54:09
问题 Suppose I have matrix A. 1st column is "group". Then I want to calculate the average of 2nd column for each group. So I want to create B. A= 1 2 1 3 2 4 2 2 B= 1 2.5 2 3 The best thing I did until now is to construct a long for and if loop and use average function to get to B. But I guess there will be more simple method. Is there? 回答1: I hadn't used accumarray before, so due to the comment by @Dan I decided to give it a try. At first I tried a naive version and used histc to count

How to filter (with dplyr) for all values of a group if variable limit is reached?

牧云@^-^@ 提交于 2019-11-28 05:37:54
问题 Here's the dummy data: cases <- rep(1:5,times=2) var1 <- as.numeric(c(450,100,250,999,200,500,980,10,700,1000)) var2 <- as.numeric(c(111,222,333,444,424,634,915,12,105,152)) maindata1 <- data.frame(cases,var1,var2) df1 <- maindata1 %>% filter(var1 >950) %>% distinct(cases) %>% select(cases) table1 <- maindata1 %>% filter(cases == 2 | cases == 4 | cases == 5) %>% arrange(cases) > table1 cases var1 var2 1 2 100 222 2 2 980 915 3 4 999 444 4 4 700 105 5 5 200 424 6 5 1000 152 I'm trying to

r data.frame create new variable

淺唱寂寞╮ 提交于 2019-11-28 05:08:37
问题 I have a dataframe with around 1.5 million rows and 5 cols. One variable (VARIABLE) is of this type NATIONALITY_YEAR (e.g. SPAIN_1998) and I want to split it in two columns, one containing the Nationality, which is the left side of the name before the underscore, and one containing the Year, right side of the underscore. I have tried with concat.split which should be the easiest way: aa <- concat.split(mydata, "VARIABLE", sep = "_", drop = F) but after 2 hours running it did not produce any

Finding maximum value of one column (by group) and inserting value into another data frame in R

烈酒焚心 提交于 2019-11-28 02:13:26
All, I was hoping someone could find a solution to an issue of mine that isn't necessarily causing headaches, but, as of right now, invites the possibility for human error in creating a data set for a project on which I'm working. The data set I'm using right now is a directed dyad-year (A vs. B, B vs. A) data set for select pairs of countries for every year between 1950 and 2010. Some countries, like A in my example, will be paired with every country in the world and every country will be paired with it. Some countries, like B and C in my example, will be paired with just a few countries.

Assign value to group based on condition in column

£可爱£侵袭症+ 提交于 2019-11-27 14:27:12
I have a data frame that looks like the following: > df = data.frame(group = c(1,1,1,2,2,2,3,3,3), date = c(1,2,3,4,5,6,7,8,9), value = c(3,4,3,4,5,6,6,4,9)) > df group date value 1 1 1 3 2 1 2 4 3 1 3 3 4 2 4 4 5 2 5 5 6 2 6 6 7 3 7 6 8 3 8 4 9 3 9 9 I want to create a new column that contains the date value per group that is associated with the value "4" from the value column. The following data frame shows what I hope to accomplish. group date value newValue 1 1 1 3 2 2 1 2 4 2 3 1 3 3 2 4 2 4 4 4 5 2 5 5 4 6 2 6 6 4 7 3 7 6 8 8 3 8 4 8 9 3 9 9 8 As we can see, group 1 has the newValue "2"

remove row with nan value

和自甴很熟 提交于 2019-11-27 12:06:30
问题 let's say, for example, i have this data: data <- c(1,2,3,4,5,6,NaN,5,9,NaN,23,9) attr(data,"dim") <- c(6,2) data [,1] [,2] [1,] 1 NaN [2,] 2 5 [3,] 3 9 [4,] 4 NaN [5,] 5 23 [6,] 6 9 Now i want to remove the rows with the NaN values in it: row 1 and 4. But i don't know where these rows are, if it's a dataset of 100.000+ rows, so i need to find them with a function and remove the complete row. Can anybody point me in the right direction? 回答1: The function complete.cases will tell you where the

Sliding time intervals for time series data in R

霸气de小男生 提交于 2019-11-27 11:42:52
问题 I am trying to extract interesting statistics for an irregular time series data set, but coming up short on finding the right tools for the job. The tools for manipulating regularly sampled time series or index-based series of any time are pretty easily found, though I'm not having much luck with the problems I'm trying to solve. First, a reproducible data set: library(zoo) set.seed(0) nSamples <- 5000 vecDT <- rexp(nSamples, 3) vecTimes <- cumsum(c(0,vecDT)) vecDrift <- c(0, rnorm(nSamples,

Converting String Array to an Integer Array

南楼画角 提交于 2019-11-27 08:52:24
so basically user enters a sequence from an scanner input. 12, 3, 4 , etc. It can be of any length long and it has to be integers. I want to convert the string input to an integer array. so int[0] would be 12 , int[1] would be 3 , etc. Any tips and ideas? I was thinking of implementing if charat(i) == ',' get the previous number(s) and parse them together and apply it to the current available slot in the array. But I'm not quite sure how to code that. Java Devil You could read the entire input line from scanner, then split the line by , then you have a String[] , parse each number into int[]

Categorizing variabels in SAS using a range system

人走茶凉 提交于 2019-11-27 08:51:56
问题 I have the numeric values of salaries of different employee's. I want to break the ranges up into categories. However I do not want a new column rather, I want to just format the existing salary column into this range method: At least $20,000 but less than $100,000 - At least $100,000 and up to $500,000 - >$100,000 Missing - Missing salary Any other value - Invalid salary I've done something similar with gender. I just want to use the proc print and format command to show salary and gender.

Extract non null elements from a list in R

此生再无相见时 提交于 2019-11-27 05:42:49
问题 I have a list like this: x = list(a = 1:4, b = 3:10, c = NULL) x #$a #[1] 1 2 3 4 # #$b #[1] 3 4 5 6 7 8 9 10 # #$c #NULL and I want to extract all elements that are not null. How can this be done? Thanks. 回答1: Here's another option: Filter(Negate(is.null), x) 回答2: What about: x[!unlist(lapply(x, is.null))] Here is a brief description of what is going on. lapply tells us which elements are NULL R> lapply(x, is.null) $a [1] FALSE $b [1] FALSE $c [1] TRUE Next we convect the list into a vector: