summarization | 易学教程

Using conditions in group_by()/summarize() loop

阅读更多关于 Using conditions in group_by()/summarize() loop

问题 I have a dataframe that looks something like this (I have a lot more years and variables): Name State2014 State2015 State2016 Tuition2014 Tuition2015 Tuition2016 StateGrants2014 Jared CA CA MA 22430 23060 40650 5000 Beth CA CA CA 36400 37050 37180 4200 Steven MA MA MA 18010 18250 18720 NA Lary MA CA MA 24080 30800 24600 6600 Tom MA OR OR 40450 15800 16040 NA Alfred OR OR OR 23570 23680 23750 3500 Cathy OR OR OR 32070 32070 33040 4700 My objective (in this example) is to get the mean tuition

data.table: Using with=False and transforming function/summary function?

阅读更多关于 data.table: Using with=False and transforming function/summary function?

问题 I want to summarise several variables in data.table, output in wide format, output possibly as a list per variable. Since several other approaches did not work, I tried to do an outer lapply, giving the names of the variables as character vectors. I wanted to pass these in, using with=FALSE. carsx=as.data.table(cars) lapply( list(speed="speed",dist= "dist"), #error object 'ansvals' not found function(x) carsx[,list(mean(x), min(x), max(x) ), with=FALSE ] ) Since this does not work, I tried

R: nested grouped summaries with dplyr?

阅读更多关于 R: nested grouped summaries with dplyr?

问题 I'm trying to practise the R dplyr package with a hypothetical dataset (link to pastebin) of people's drinking records at different bars: bar_name,person,drink_ordered,times_ordered,liked_it Moe’s Tavern,Homer,Romulan ale,2,TRUE Moe’s Tavern,Homer,Scotch whiskey,1,FALSE Moe’s Tavern,Guinan,Romulan ale,1,TRUE Moe’s Tavern,Guinan,Scotch whiskey,3,FALSE Moe’s Tavern,Rebecca,Romulan ale,2,FALSE Moe’s Tavern,Rebecca,Scotch whiskey,4,TRUE Cheers,Rebecca,Budweiser,1,TRUE Cheers,Rebecca,Black Hole,1

ROUGE evaluation method gives zero value

阅读更多关于 ROUGE evaluation method gives zero value

问题 I have set all parameters as discribed in http://kavita-ganesan.com/rouge-howto. But I get zero values of precision recall and f-1. Please, Help me what can i do? 回答1: If you have set all parameters right and are not getting any error while running rouge then probably you are doing the following mistake while making your summary files in html format. rouge does not handle whitespaces properly thus <a name="1">[1]</a> <a href="#1" id= 1> <a name="1">[1]</a> <a href="#1" id=1> are not the same

Dataset link for Text Summarization?

阅读更多关于 Dataset link for Text Summarization?

问题 Anyone have dataset download link for text summarization like DUC 2007 or TREC? Please, help me. 回答1: You can use http://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports for extraction based text summarization approach. It contains catchPhrase, which can be act as selected sentence for training. But catchphrase may not be as much appropriate. 回答2: You can access DUC dataset after completing some organization and individual agreements ..kindly refer http://www-nlpir.nist.gov/projects/duc

dplyr idiom for summarize() a filtered-group-by, and also replace any NAs due to missing rows

阅读更多关于 dplyr idiom for summarize() a filtered-group-by, and also replace any NAs due to missing rows

问题 I am computing a dplyr::summarize across a dataframe of sales data. I do a group-by (S,D,Y), then within each group, compute medians and means for weeks 5..43, then merge those back into the parent df. Variable X is sales. X is never NA (i.e. there are no explicit NAs anywhere in df), but if there is no data (as in, no sales) for that S,D,Y and set of weeks, there will simply be no row with those values in df (take it that means zero sales for that particular set of parameters). In other

Get dplyr count of distinct in a readable way

阅读更多关于 Get dplyr count of distinct in a readable way

问题 I'm new using dplyr, I need to calculate the distinct values in a group. Here's a table example: data=data.frame(aa=c(1,2,3,4,NA), bb=c('a', 'b', 'a', 'c', 'c')) I know I can do things like: by_bb<-group_by(data, bb, add = TRUE) summarise(by_bb, mean(aa, na.rm=TRUE), max(aa), sum(!is.na(aa)), length(aa)) But if I want the count of unique elements? I can do: > summarise(by_bb,length(unique(unlist(aa)))) bb length(unique(unlist(aa))) 1 a 2 2 b 1 3 c 2 and if I want to exclude NAs I cand do: >

How to install the Python package pyrouge on Microsoft Windows?

阅读更多关于 How to install the Python package pyrouge on Microsoft Windows?

问题 I want to use the python package pyrouge on Microsoft Windows. The package doesn't give any instructions on how to install it on Microsoft Windows. How can I do so? 回答1: The following instructions were tested on Windows 7 SP1 x64 Ultimate and python 3.5 x64 (Anaconda). 1) In the cmd.exe , run pip install pyrouge 2) Download ROUGE-1.5.5 . You may download it from https://github.com/andersjo/pyrouge/tree/master/tools/ROUGE-1.5.5 3) pyrouge comes with a python script named pyrouge_set_rouge_path

disaggregate summarised table in SQL Server 2008

阅读更多关于 disaggregate summarised table in SQL Server 2008

问题 I've received data from an external source, which is in a summarised format. I need a way to disaggregate this to fit into a system I am using. To illustrate, suppose the data I received looks like this: receivedTable: Age Gender Count 40 M 3 41 M 2 I want this is a disaggregated format like this: systemTable: ID Age Gender 1 40 M 2 40 M 3 40 M 4 41 M 5 41 M Thanks Karl 回答1: Depending of the range of your count you could use a lookup table that holds exactly x records for each integer x. Like

obtaining 3 most common elements of groups, concatenating ties, and ignoring less common values

阅读更多关于 obtaining 3 most common elements of groups, concatenating ties, and ignoring less common values

问题 I am trying to get the 3 most common numbers per group of a dataframe, using a function, but ignoring the less common values (per group), and allowing a unique number if present. Accepted answer will have the lowest system.time #my current function library(plyr) get.3modes.andcounts<- function(origtable,groupby,columnname) { data <- ddply (origtable, groupby, .fun = function(xx){ c(m1 = paste(names(sort(table(xx[,columnname]),decreasing=TRUE)[1])), m2 = paste(names(sort(table(xx[,columnname])