plyr

Extract rows with highest and lowest values from a data frame

ぐ巨炮叔叔 提交于 2021-02-18 07:01:47
问题 I'm quite new to R, I use it mainly for visualising statistics using ggplot2 library. Now I have faced a problem with data preparation. I need to write a function, that will remove some number (2, 5 or 10) rows from a data frame that have highest and lowest values in specified column and put them into another data frame, and do this for each combination of two factors (in my case: for each day and server). Up to this point, I have done the following steps (MWE using esoph example dataset). I

Aggregation and percentage calculation by groups

試著忘記壹切 提交于 2021-02-08 15:28:29
问题 I have a dataset in R of student weekly allowances by class, which looks like: Year ID Class Allowance 2013 123 Freshman 100 2013 234 Freshman 110 2013 345 Sophomore 150 2013 456 Sophomore 200 2013 567 Junior 250 2014 678 Junior 100 2014 789 Junior 230 2014 890 Freshman 110 2014 891 Freshman 250 2014 892 Sophomore 220 How can I summarize the results by group (Year/Class) to get sum and % (by group)? Getting sum seems easy with ddply by just couldn't get the % by group part right. It works for

How to divide a given time series dataset into 4 hour window in R

房东的猫 提交于 2021-02-08 08:21:01
问题 I have a time series dataframe like this for a given day. Datetime <- c("2015-09-29 00:00:13", "2015-09-29 00:45:00", "2015-09-29 02:53:20", "2015-09-29 03:22:18", "2015-09-29 05:42:10", "2015-09-29 05:55:50", "2015-09-29 06:14:10", "2015-09-29 07:42:16", "2015-09-29 08:31:15", "2015-09-29 09:13:10", "2015-09-29 11:45:14", "2015-09-29 11:56:00", "2015-09-29 13:44:00", "2015-09-29 14:41:20", "2015-09-29 15:33:10", "2015-09-29 15:24:00", "2015-09-29 17:24:12", "2015-09-29 17:28:16", "2015-09-29

How to divide a given time series dataset into 4 hour window in R

巧了我就是萌 提交于 2021-02-08 08:20:10
问题 I have a time series dataframe like this for a given day. Datetime <- c("2015-09-29 00:00:13", "2015-09-29 00:45:00", "2015-09-29 02:53:20", "2015-09-29 03:22:18", "2015-09-29 05:42:10", "2015-09-29 05:55:50", "2015-09-29 06:14:10", "2015-09-29 07:42:16", "2015-09-29 08:31:15", "2015-09-29 09:13:10", "2015-09-29 11:45:14", "2015-09-29 11:56:00", "2015-09-29 13:44:00", "2015-09-29 14:41:20", "2015-09-29 15:33:10", "2015-09-29 15:24:00", "2015-09-29 17:24:12", "2015-09-29 17:28:16", "2015-09-29

How to mimic geom_boxplot() with outliers using geom_boxplot(stat = “identity”)

霸气de小男生 提交于 2021-02-07 12:38:21
问题 I would like to pre-compute by-variable summaries of data (with plyr and passing a quantile function) and then plot with geom_boxplot(stat = "identity") . This works great except it (a) does not plot outliers as points and (b) extends the "whiskers" to the max and min of the data being plotted. Example: library(plyr) library(ggplot2) set.seed(4) df <- data.frame(fact = sample(letters[1:2], 12, replace = TRUE), val = c(1:10, 100, 101)) df # fact val # 1 b 1 # 2 a 2 # 3 a 3 # 4 a 4 # 5 b 5 # 6

Flatten nested list of lists with variable numbers of elements to a data frame

霸气de小男生 提交于 2021-02-07 10:57:41
问题 I've got a nested list of lists that I'd like to flatten into a dataframe with id variables so I know which list elements (and sub-list elements) each came from. > str(gc_all) List of 3 $ 1: num [1:102, 1:2] -74 -73.5 -73 -72.5 -71.9 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:2] "lon" "lat" $ 2: num [1:102, 1:2] -74 -73.3 -72.5 -71.8 -71 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:2] "lon" "lat" $ 3:List of 2 ..$ : num [1:37, 1:2] -74 -74.4 -74

R spread dataframe [duplicate]

烂漫一生 提交于 2021-02-04 21:33:54
问题 This question already has answers here : Reshape multiple value columns to wide format (5 answers) Closed 7 months ago . IN R language how to convert data1 into data2 data1 = fread(" id year cost pf loss A 2019-02 155 10 41 B 2019-03 165 14 22 B 2019-01 185 34 56 C 2019-02 350 50 0 A 2019-01 310 40 99") data2 = fread(" id item 2019-01 2019-02 2019-03 A cost 30 155 NA A pf 40 10 NA A loss 99 41 NA B cost 185 NA 160 B pf 34 NA 14 B loss 56 NA 22 C cost NA 350 NA C pf NA 50 NA C loss NA 0 NA") I

Correlation between two dataframes by row

瘦欲@ 提交于 2021-02-04 10:22:13
问题 I have 2 data frames w/ 5 columns and 100 rows each. id price1 price2 price3 price4 price5 1 11.22 25.33 66.47 53.76 77.42 2 33.56 33.77 44.77 34.55 57.42 ... I would like to get the correlation of the corresponding rows, basically for(i in 1:100){ cor(df1[i, 1:5], df2[i, 1:5]) } but without using a for-loop. I'm assuming there's someway to use plyr to do it but can't seem to get it right. Any suggestions? 回答1: Depending on whether you want a cool or fast solution you can use either diag(cor

which.min within reshape2's dcast()?

萝らか妹 提交于 2021-01-29 08:31:47
问题 I would like to extract the value of var2 that corresponds to the minimum value of var1 in each building-month combination. Here's my (fake) data set: head(mydata) # building month var1 var2 #1 A 1 -26.96333 376.9633 #2 A 1 165.38759 317.3993 #3 A 1 47.46345 271.0137 #4 A 2 73.47784 294.8171 #5 A 2 107.80130 371.7668 #6 A 2 10.16384 308.7975 Reproducible code: ## create fake data set: set.seed(142) mydata1 = data.frame(building = rep(LETTERS[1:5],6),month = sort(rep(1:6,5)),var1=rnorm(30,50

Adding Quality Selector to plyr when using HLS Stream

偶尔善良 提交于 2021-01-28 19:37:59
问题 I am using plyr as wrapper around HTML5 video tag and using Hls.js to stream my .m3u8 video . I was going around a lot of issues on plyr to enable quality selectors and came arounf multiple PR's which had this question but was closed saying the implementation is merged, till i came around this PR which says it's still open, but there was a custom implementation in the Comments which assured that it works . I was trying that implementation locally in order to check if we can add a quality