plyr | 易学教程

multiply median for groups separately in R by condition

阅读更多关于 multiply median for groups separately in R by condition

问题 I have this dataset df=structure(list(Dt = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L), .Label = c("2018-02-20 00:00:00.000", "2018-02-21 00:00:00.000",

How to read nested JSON structure in R?

阅读更多关于 How to read nested JSON structure in R?

问题 I have some JSON that looks like this: "total_rows":141,"offset":0,"rows":[ {"id":"1","key":"a","value":{"SP$Sale_Price":"240000","CONTRACTDATE$Contract_Date":"2006-10-26T05:00:00"}}, {"id":"2","key":"b","value":{"SP$Sale_Price":"2000000","CONTRACTDATE$Contract_Date":"2006-08-22T05:00:00"}}, {"id":"3","key":"c","value":{"SP$Sale_Price":"780000","CONTRACTDATE$Contract_Date":"2007-01-18T06:00:00"}}, ... In R, what would be the easiest way to produce a scatter-plot of SP$Sale_Price versus

Revalue attributes from multiple columns

阅读更多关于 Revalue attributes from multiple columns

问题 I have a dataset like the following. dat1 <- read.table(header=TRUE, text=" ID Pa Gu Ta 8645 Rel345 Gel294 Tel452 6228 Rel345 Gel294 Tel467 5830 Rel345 Gel294 Tel467 1844 Rel345 Gel295 Tel467 4461 Rel345 Gel295 Tel467 2119 Rel345 Gel294 Tel452 1821 Rel345 Gel294 Tel467 6851 Rel345 Gel294 Tel467 4214 Rel345 Gel294 Tel452 2589 Rel346 Gel294 Tel467 2116 Rel347 Gel294 Tel452 8523 Rel348 Gel295 Tel468 2603 Rel348 Gel295 Tel468 2801 Rel348 Gel295 Tel452 1485 Rel348 Gel295 Tel468 2116 Rel348 Gel295

How to create R output likes confusion matrix table

阅读更多关于 How to create R output likes confusion matrix table

问题 I have two of directories: The name of first directory is "model" and the second directory is "test", the list of files in both of directories are same but have different content. The total number of files in both of directories also same, that is 37 files. I show the example of content from one of file. First file from model directory Name file : Model_A5B45 data 1 papaya | durian | orange | grapes 2 orange 3 grapes 4 banana | durian 5 tomato 6 apple | tomato 7 apple 8 mangostine 9

count shared occurrences and remove duplicates

阅读更多关于 count shared occurrences and remove duplicates

问题 I have this data.frame : df <- read.table(text= " section to from time a 1 5 9 a 2 5 9 a 1 5 10 a 2 6 10 a 2 7 11 a 2 7 12 a 3 7 12 a 4 7 12 a 4 6 13 ", header = TRUE) Each row identifies the simultaneoues occurence of an id in to and from at a timepoint time . Basically a time explicit network of ids in to and from . I want to know which to ids shared a from id within a particular time range which is 2 . In otherwards i want to know if ids 1 and 2 in to both went to coffee shop 5 within two

subsetting df with repeated sequences

阅读更多关于 subsetting df with repeated sequences

问题 I have searched high and low for a solution to this, but I cannot find one..... My dataframe (essentially a table of the no. 1 sports team by date) has numerous occasions where one or various teams would "reappear" in the data. I want to pull out the start (or end) date of each period at no. 1 per team. An example of the data could be: x1<- as.Date("2013-12-31") adddate1 <- 1:length(teams1) dates1 <- x1 + adddate1 teams2 <- c(rep("w", 3), rep("c", 8), rep("w", 4)) x2<- as.Date("2012-12-31")

summarising character values with ddply

阅读更多关于 summarising character values with ddply

问题 I have the following dataframe: df <- structure(list(year = c(1986L, 1987L, 1991L, 1991L, 1991L, 1991L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L, 1994L), knmilocatie = structure(c(4L, 16L, 10L, 12L, 9L, 20L, 12L, 12L, 25L, 9L, 30L, 26L,

R: Percentile calculations on subsets of data

阅读更多关于 R: Percentile calculations on subsets of data

问题 I have a data set which contains the following identifiers, an rscore, gvkey, sic2, year, and cdom. What I am looking to do is calculate percentile ranks based on summed rscores for all temporal spans (~1500) for a given gvkey, and then calculate percentile ranks in a given temporal time span and sic2 based on gvkey. Calculating the percentiles for all temporal time spans is a fairly quick process, however once I add in calculating the sic2 percentile ranks it's fairly slow, but we are likely

Select the most common value of a column based on matched pairs from two columns using `ddply`

阅读更多关于 Select the most common value of a column based on matched pairs from two columns using `ddply`

问题 I'm trying to use ddply (a plyr function) to sort and identify the most frequent interaction type between any unique pairs of user from a social media data of the following form from <- c('A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D') to <- c('B', 'B', 'D', 'A', 'C', 'C', 'D', 'A', 'D', 'B', 'A', 'B', 'B', 'A', 'C') interaction_type <- c('like', 'comment', 'share', 'like', 'like', 'like', 'comment', 'like', 'like', 'share', 'like', 'comment', 'like', 'share', 'like

Transposing a data frame [duplicate]

阅读更多关于 Transposing a data frame [duplicate]

问题 This question already has answers here : Reshape wide format, to multi-column long format (4 answers) Closed 6 years ago . I have a question about re-shaping (if that's the right word) a data frame to a transposed version of it. I want to take something like: A B C 1 6 1 1 18 1 1 21 1 3 18 1 3 21 1 4 6 1 4 18 1 4 20 1 4 21 1 And turn it into a dataframe like: A B_1 C_1 B_2 C_2 B_3 C_3 ... 1 6 1 18 1 21 1 3 18 1 21 1 4 6 1 18 1 20 1 21 1 Is there some go-to function in R that I'm unaware of or