reduce | 易学教程

Why does reduce give a StackOverflowError in Clojure?

阅读更多关于 Why does reduce give a StackOverflowError in Clojure?

问题 I'm trying to concatenate a Seq of Seqs. I can do it with apply concat . user=> (count (apply concat (repeat 3000 (repeat 3000 true)))) 9000000 However, from my limited knowledge, I would assume that the use of apply forces the lazy Seq to be realised, and that doesn't seem right for very large inputs. I'd rather do this lazily if I can. So I thought that using reduce would do the job. user=> (count (reduce concat (repeat 3000 (repeat 3000 true)))) But this results in StackOverflowError

Why is the final reduce step extremely slow in this MapReduce? (HiveQL, HDFS MapReduce)

阅读更多关于 Why is the final reduce step extremely slow in this MapReduce? (HiveQL, HDFS MapReduce)

Some background information: I'm working with Dataiku DSS, HDFS, and partitioned datasets. I have a particular job running (Hive query) which has two input datasets - one a very large, partitioned dataset, the other a small (~250 rows, 2 columns), non-partitioned dataset. Let's call the partitioned table A, and the non-partitioned table B. Question: The query is of the following format, SELECT a.f1, f2, ..., fn FROM A as a LEFT JOIN B as b ON a.f1 = b.f1 WHERE {PARTITION_FILTER} Here is the current ouput from the MapReduce job (keep in mind this job is still running): [09:05:53] [INFO] [dku

In Stream reduce method, must the identity always be 0 for sum and 1 for multiplication?

阅读更多关于 In Stream reduce method, must the identity always be 0 for sum and 1 for multiplication?

I proceed java 8 learning. I have found interesting behaviour: lets see code sample: // identity value and accumulator and combiner Integer summaryAge = Person.getPersons().stream() //.parallel() //will return surprising result .reduce(1, (intermediateResult, p) -> intermediateResult + p.age, (ir1, ir2) -> ir1 + ir2); System.out.println(summaryAge); and model class: public class Person { String name; Integer age; ///... public static Collection<Person> getPersons() { List<Person> persons = new ArrayList<>(); persons.add(new Person("Vasya", 12)); persons.add(new Person("Petya", 32)); persons

Merging more than 2 dataframes in R by rownames

阅读更多关于 Merging more than 2 dataframes in R by rownames

问题 I gather data from 4 df's and would like to merge them by rownames. I am looking for an efficient way to do this. This is a simplified version of the data I have. df1 <- data.frame(N= sample(seq(9, 27, 0.5), 40, replace= T), P= sample(seq(0.3, 4, 0.1), 40, replace= T), C= sample(seq(400, 500, 1), 40, replace= T)) df2 <- data.frame(origin= sample(c("A", "B", "C", "D", "E"), 40, replace= T), foo1= sample(c(T, F), 40, replace= T), X= sample(seq(145600, 148300, 100), 40, replace= T), Y= sample

Java Stream: divide into two lists by boolean predicate

阅读更多关于 Java Stream: divide into two lists by boolean predicate

问题 I have a list of employees . They have isActive boolean field. I would like to divide employees into two lists: activeEmployees and formerEmployees . Is it possible to do using Stream API? What is the most sophisticated way? 回答1: Collectors.partitioningBy: Map<Boolean, List<Employee>> partitioned = listOfEmployees.stream().collect( Collectors.partitioningBy(Employee::isActive)); The resulting map contains two lists, corresponding to whether or not the predicate was matched: List<Employee>

Javascript reduce on array of objects

阅读更多关于 Javascript reduce on array of objects

Say I want to sum a.x for each element in arr . arr = [{x:1},{x:2},{x:4}] arr.reduce(function(a,b){return a.x + b.x}) >> NaN I have cause to believe that a.x is undefined at some point. The following works fine arr = [1,2,4] arr.reduce(function(a,b){return a + b}) >> 7 What am I doing wrong in the first example? After the first iteration your're returning a number and then trying to get property x of it to add to the next object which is undefined and maths involving undefined results in NaN . try returning an object contain an x property with the sum of the x properties of the parameters: var

Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?

阅读更多关于 Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?

问题 I think the title is already explaining my question. I would like to change key (tab space) value into key;value in all output files the reducers are generating from the output of mappers. I could not find good documentation on this using google. Can anyone please give a fraction of code on how to achieve this? 回答1: Set the configuration property mapred.textoutputformat.separator to ";" 回答2: In lack of better documentation, here's what I've collected: setTextOutputFormatSeparator(final Job

Spark groupByKey alternative

阅读更多关于 Spark groupByKey alternative

According to Databricks best practices, Spark groupByKey should be avoided as Spark groupByKey processing works in a way that the information will be first shuffled across workers and then the processing will occur. Explanation So, my question is, what are the alternatives for groupByKey in a way that it will return the following in a distributed and fast way? // want this {"key1": "1", "key1": "2", "key1": "3", "key2": "55", "key2": "66"} // to become this {"key1": ["1","2","3"], "key2": ["55","66"]} Seems to me that maybe aggregateByKey or glom could do it first in the partition ( map ) and

How to break on reduce method

阅读更多关于 How to break on reduce method

How can I break the iteration on reduce method? for for (var i = Things.length - 1; i >= 0; i--) { if(Things[i] <= 0){ break; } }; reduce Things.reduce(function(memo, current){ if(current <= 0){ //break ??? //return; <-- this will return undefined to memo, which is not what I want } }, 0) UPDATE Some of the commentators make a good point that the original array is being mutated in order to break early inside the .reduce() logic. Therefore, I've modified the answer slightly by adding a .slice(0) before calling a follow-on .reduce() step. This is to preserve the original array by copying its

How does reduce function work?

阅读更多关于 How does reduce function work?

问题 As far as I understand, the reduce function takes a list l and a function f . Then, it calls the function f on first two elements of the list and then repeatedly calls the function f with the next list element and the previous result. So, I define the following functions: The following function computes the factorial. def fact(n): if n == 0 or n == 1: return 1 return fact(n-1) * n def reduce_func(x,y): return fact(x) * fact(y) lst = [1, 3, 1] print reduce(reduce_func, lst) Now, shouldn't this