plyr

multiply median for groups separately in R by condition

老子叫甜甜 提交于 2019-12-11 12:35:16
问题 I have this dataset df=structure(list(Dt = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L), .Label = c("2018-02-20 00:00:00.000", "2018-02-21 00:00:00.000",

How to read nested JSON structure in R?

≡放荡痞女 提交于 2019-12-11 12:23:00
问题 I have some JSON that looks like this: "total_rows":141,"offset":0,"rows":[ {"id":"1","key":"a","value":{"SP$Sale_Price":"240000","CONTRACTDATE$Contract_Date":"2006-10-26T05:00:00"}}, {"id":"2","key":"b","value":{"SP$Sale_Price":"2000000","CONTRACTDATE$Contract_Date":"2006-08-22T05:00:00"}}, {"id":"3","key":"c","value":{"SP$Sale_Price":"780000","CONTRACTDATE$Contract_Date":"2007-01-18T06:00:00"}}, ... In R, what would be the easiest way to produce a scatter-plot of SP$Sale_Price versus

Revalue attributes from multiple columns

自古美人都是妖i 提交于 2019-12-11 12:13:27
问题 I have a dataset like the following. dat1 <- read.table(header=TRUE, text=" ID Pa Gu Ta 8645 Rel345 Gel294 Tel452 6228 Rel345 Gel294 Tel467 5830 Rel345 Gel294 Tel467 1844 Rel345 Gel295 Tel467 4461 Rel345 Gel295 Tel467 2119 Rel345 Gel294 Tel452 1821 Rel345 Gel294 Tel467 6851 Rel345 Gel294 Tel467 4214 Rel345 Gel294 Tel452 2589 Rel346 Gel294 Tel467 2116 Rel347 Gel294 Tel452 8523 Rel348 Gel295 Tel468 2603 Rel348 Gel295 Tel468 2801 Rel348 Gel295 Tel452 1485 Rel348 Gel295 Tel468 2116 Rel348 Gel295

How to create R output likes confusion matrix table

ぐ巨炮叔叔 提交于 2019-12-11 11:57:01
问题 I have two of directories: The name of first directory is "model" and the second directory is "test", the list of files in both of directories are same but have different content. The total number of files in both of directories also same, that is 37 files. I show the example of content from one of file. First file from model directory Name file : Model_A5B45 data 1 papaya | durian | orange | grapes 2 orange 3 grapes 4 banana | durian 5 tomato 6 apple | tomato 7 apple 8 mangostine 9

count shared occurrences and remove duplicates

瘦欲@ 提交于 2019-12-11 10:36:43
问题 I have this data.frame : df <- read.table(text= " section to from time a 1 5 9 a 2 5 9 a 1 5 10 a 2 6 10 a 2 7 11 a 2 7 12 a 3 7 12 a 4 7 12 a 4 6 13 ", header = TRUE) Each row identifies the simultaneoues occurence of an id in to and from at a timepoint time . Basically a time explicit network of ids in to and from . I want to know which to ids shared a from id within a particular time range which is 2 . In otherwards i want to know if ids 1 and 2 in to both went to coffee shop 5 within two

subsetting df with repeated sequences

流过昼夜 提交于 2019-12-11 10:35:57
问题 I have searched high and low for a solution to this, but I cannot find one..... My dataframe (essentially a table of the no. 1 sports team by date) has numerous occasions where one or various teams would "reappear" in the data. I want to pull out the start (or end) date of each period at no. 1 per team. An example of the data could be: x1<- as.Date("2013-12-31") adddate1 <- 1:length(teams1) dates1 <- x1 + adddate1 teams2 <- c(rep("w", 3), rep("c", 8), rep("w", 4)) x2<- as.Date("2012-12-31")

summarising character values with ddply

可紊 提交于 2019-12-11 09:26:49
问题 I have the following dataframe: df <- structure(list(year = c(1986L, 1987L, 1991L, 1991L, 1991L, 1991L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L), knmilocatie = structure(c(4L, 16L, 10L, 12L, 9L, 20L, 12L, 12L, 25L, 9L, 30L, 26L,

R: Percentile calculations on subsets of data

断了今生、忘了曾经 提交于 2019-12-11 08:37:15
问题 I have a data set which contains the following identifiers, an rscore, gvkey, sic2, year, and cdom. What I am looking to do is calculate percentile ranks based on summed rscores for all temporal spans (~1500) for a given gvkey, and then calculate percentile ranks in a given temporal time span and sic2 based on gvkey. Calculating the percentiles for all temporal time spans is a fairly quick process, however once I add in calculating the sic2 percentile ranks it's fairly slow, but we are likely

Select the most common value of a column based on matched pairs from two columns using `ddply`

喜你入骨 提交于 2019-12-11 07:33:38
问题 I'm trying to use ddply (a plyr function) to sort and identify the most frequent interaction type between any unique pairs of user from a social media data of the following form from <- c('A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D') to <- c('B', 'B', 'D', 'A', 'C', 'C', 'D', 'A', 'D', 'B', 'A', 'B', 'B', 'A', 'C') interaction_type <- c('like', 'comment', 'share', 'like', 'like', 'like', 'comment', 'like', 'like', 'share', 'like', 'comment', 'like', 'share', 'like

Transposing a data frame [duplicate]

不想你离开。 提交于 2019-12-11 06:42:43
问题 This question already has answers here : Reshape wide format, to multi-column long format (4 answers) Closed 6 years ago . I have a question about re-shaping (if that's the right word) a data frame to a transposed version of it. I want to take something like: A B C 1 6 1 1 18 1 1 21 1 3 18 1 3 21 1 4 6 1 4 18 1 4 20 1 4 21 1 And turn it into a dataframe like: A B_1 C_1 B_2 C_2 B_3 C_3 ... 1 6 1 18 1 21 1 3 18 1 21 1 4 6 1 18 1 20 1 21 1 Is there some go-to function in R that I'm unaware of or