summarization | 易学教程

How can I calculate the percentage change within a group for multiple columns in R?

阅读更多关于 How can I calculate the percentage change within a group for multiple columns in R?

问题 I have a data frame with an ID column, a date column (12 months for each ID), and I have 23 numeric variables. I would like to obtain the percentage change by month within each ID. I am using the quantmod package in order to obtain the percent change. Here is an example with only three columns (for simplicity): ID Date V1 V2 V3 1 Jan 2 3 5 1 Feb 3 4 6 1 Mar 7 8 9 2 Jan 1 1 1 2 Feb 2 3 4 2 Mar 7 8 8 I tried to use dplyr and the summarise_each function, but that was unsuccessful. More

How to use Open Text Summarizer API?

阅读更多关于 How to use Open Text Summarizer API?

I'm currently building a system, which will summarize a article from a webpage like Wikipedia. I'm able to extract texts from web pages, and I know that the Open Text Summarizer API can help me to do summarization, but the problem is I don't know how to use it properly. Please anyone who happen to know how to use this library? Can you provide a simple example for me? Currently I'm doing my project in C#. There is a lot of examples in codeplex . Did you read it ? Well, here a sample from the Winform demo : SummarizerArguments sumargs = new SummarizerArguments { DictionaryLanguage = "en",

How can I calculate the percentage change within a group for multiple columns in R?

阅读更多关于 How can I calculate the percentage change within a group for multiple columns in R?

I have a data frame with an ID column, a date column (12 months for each ID), and I have 23 numeric variables. I would like to obtain the percentage change by month within each ID. I am using the quantmod package in order to obtain the percent change. Here is an example with only three columns (for simplicity): ID Date V1 V2 V3 1 Jan 2 3 5 1 Feb 3 4 6 1 Mar 7 8 9 2 Jan 1 1 1 2 Feb 2 3 4 2 Mar 7 8 8 I tried to use dplyr and the summarise_each function, but that was unsuccessful. More specifically, I tried the following (train is the name of the data set): library(dplyr) library(quantmod) group1

Identify a value changes' date and summarize the data with sum() and diff() in R

阅读更多关于 Identify a value changes' date and summarize the data with sum() and diff() in R

Sample Data: product_id <- c("1000","1000","1000","1000","1000","1000", "1002","1002","1002","1002","1002","1002") qty_ordered <- c(1,2,1,1,1,1,1,2,1,2,1,1) price <- c(2.49,2.49,2.49,1.743,2.49,2.49, 2.093,2.093,2.11,2.11,2.11, 2.97) date <- c("2/23/15","2/23/15", '3/16/15','3/16/15','5/16/15', "6/18/15", "2/19/15","3/19/15","3/19/15","3/19/15","3/19/15","4/19/15") sampleData <- data.frame(product_id, qty_ordered, price, date) I would like to identify every time when a change in a price occurred. Also, I would like to sum() the total qty_ordered between those two price change dates. For

Identify a value changes' date and summarize the data with sum() and diff() in R

阅读更多关于 Identify a value changes' date and summarize the data with sum() and diff() in R

问题 Sample Data: product_id <- c("1000","1000","1000","1000","1000","1000", "1002","1002","1002","1002","1002","1002") qty_ordered <- c(1,2,1,1,1,1,1,2,1,2,1,1) price <- c(2.49,2.49,2.49,1.743,2.49,2.49, 2.093,2.093,2.11,2.11,2.11, 2.97) date <- c("2/23/15","2/23/15", '3/16/15','3/16/15','5/16/15', "6/18/15", "2/19/15","3/19/15","3/19/15","3/19/15","3/19/15","4/19/15") sampleData <- data.frame(product_id, qty_ordered, price, date) I would like to identify every time when a change in a price

relative windowed running sum through data.table non-equi join

阅读更多关于 relative windowed running sum through data.table non-equi join

I have a data set customerId, transactionDate, productId, purchaseQty loaded into a data.table. for each row, I want to calculate the sum, and mean of purchaseQty for the prior 45 day productId customerID transactionDate purchaseQty 1: 870826 1186951 2016-03-28 162000 2: 870826 1244216 2016-03-31 5000 3: 870826 1244216 2016-04-08 6500 4: 870826 1308671 2016-03-28 221367 5: 870826 1308671 2016-03-29 83633 6: 870826 1308671 2016-11-29 60500 I'm looking for an output like this: productId customerID transactionDate purchaseQty sumWindowPurchases 1: 870826 1186951 2016-03-28 162000 162000 2: 870826

Return most frequent string value for each group [duplicate]

阅读更多关于 Return most frequent string value for each group [duplicate]

问题 This question already has answers here : How to select the row with the maximum value in each group (10 answers) How to select the rows with maximum values in each group with dplyr? [duplicate] (6 answers) Closed 7 months ago . a <- c(rep(1:2,3)) b <- c("A","A","B","B","B","B") df <- data.frame(a,b) > str(b) chr [1:6] "A" "A" "B" "B" "B" "B" a b 1 1 A 2 2 A 3 1 B 4 2 B 5 1 B 6 2 B I want to group by variable a and return the most frequent value of b My desired result would look like a b 1 1 B

How to install the Python package pyrouge on Microsoft Windows?

阅读更多关于 How to install the Python package pyrouge on Microsoft Windows?

I want to use the python package pyrouge on Microsoft Windows. The package doesn't give any instructions on how to install it on Microsoft Windows. How can I do so? The following instructions were tested on Windows 7 SP1 x64 Ultimate and python 3.5 x64 (Anaconda). 1) In the cmd.exe , run pip install pyrouge 2) Download ROUGE-1.5.5 . You may download it from https://github.com/andersjo/pyrouge/tree/master/tools/ROUGE-1.5.5 3) pyrouge comes with a python script named pyrouge_set_rouge_path (it has no file extension for some reason), which you need to run in order to point pyrouge to the

Summarise over all columns

阅读更多关于 Summarise over all columns

问题 I have data of the following format: gen = function () sample.int(10, replace = TRUE) x = data.frame(A = gen(), C = gen(), G = gen(), T = gen()) I would now like to attach, to each row, the total sum of all the elements in the row (my actual function is more complex but sum illustrates the problem). Without dplyr, I’d write cbind(x, Sum = apply(x, 1, sum)) Resulting in: A C G T Sum 1 3 1 6 9 19 2 3 4 3 3 13 3 3 1 10 5 19 4 7 2 1 6 16 … But it seems surprisingly hard to do this with dplyr. I

relative windowed running sum through data.table non-equi join

阅读更多关于 relative windowed running sum through data.table non-equi join

问题 I have a data set customerId, transactionDate, productId, purchaseQty loaded into a data.table. for each row, I want to calculate the sum, and mean of purchaseQty for the prior 45 day productId customerID transactionDate purchaseQty 1: 870826 1186951 2016-03-28 162000 2: 870826 1244216 2016-03-31 5000 3: 870826 1244216 2016-04-08 6500 4: 870826 1308671 2016-03-28 221367 5: 870826 1308671 2016-03-29 83633 6: 870826 1308671 2016-11-29 60500 I'm looking for an output like this: productId