reduce

Why does reduce give a StackOverflowError in Clojure?

谁说我不能喝 提交于 2019-11-27 05:35:45
问题 I'm trying to concatenate a Seq of Seqs. I can do it with apply concat . user=> (count (apply concat (repeat 3000 (repeat 3000 true)))) 9000000 However, from my limited knowledge, I would assume that the use of apply forces the lazy Seq to be realised, and that doesn't seem right for very large inputs. I'd rather do this lazily if I can. So I thought that using reduce would do the job. user=> (count (reduce concat (repeat 3000 (repeat 3000 true)))) But this results in StackOverflowError

Why is the final reduce step extremely slow in this MapReduce? (HiveQL, HDFS MapReduce)

风流意气都作罢 提交于 2019-11-27 05:22:31
Some background information: I'm working with Dataiku DSS, HDFS, and partitioned datasets. I have a particular job running (Hive query) which has two input datasets - one a very large, partitioned dataset, the other a small (~250 rows, 2 columns), non-partitioned dataset. Let's call the partitioned table A, and the non-partitioned table B. Question: The query is of the following format, SELECT a.f1, f2, ..., fn FROM A as a LEFT JOIN B as b ON a.f1 = b.f1 WHERE {PARTITION_FILTER} Here is the current ouput from the MapReduce job (keep in mind this job is still running): [09:05:53] [INFO] [dku

In Stream reduce method, must the identity always be 0 for sum and 1 for multiplication?

随声附和 提交于 2019-11-27 04:32:43
I proceed java 8 learning. I have found interesting behaviour: lets see code sample: // identity value and accumulator and combiner Integer summaryAge = Person.getPersons().stream() //.parallel() //will return surprising result .reduce(1, (intermediateResult, p) -> intermediateResult + p.age, (ir1, ir2) -> ir1 + ir2); System.out.println(summaryAge); and model class: public class Person { String name; Integer age; ///... public static Collection<Person> getPersons() { List<Person> persons = new ArrayList<>(); persons.add(new Person("Vasya", 12)); persons.add(new Person("Petya", 32)); persons

Merging more than 2 dataframes in R by rownames

人盡茶涼 提交于 2019-11-27 04:16:30
问题 I gather data from 4 df's and would like to merge them by rownames. I am looking for an efficient way to do this. This is a simplified version of the data I have. df1 <- data.frame(N= sample(seq(9, 27, 0.5), 40, replace= T), P= sample(seq(0.3, 4, 0.1), 40, replace= T), C= sample(seq(400, 500, 1), 40, replace= T)) df2 <- data.frame(origin= sample(c("A", "B", "C", "D", "E"), 40, replace= T), foo1= sample(c(T, F), 40, replace= T), X= sample(seq(145600, 148300, 100), 40, replace= T), Y= sample

Java Stream: divide into two lists by boolean predicate

纵然是瞬间 提交于 2019-11-27 02:41:40
问题 I have a list of employees . They have isActive boolean field. I would like to divide employees into two lists: activeEmployees and formerEmployees . Is it possible to do using Stream API? What is the most sophisticated way? 回答1: Collectors.partitioningBy: Map<Boolean, List<Employee>> partitioned = listOfEmployees.stream().collect( Collectors.partitioningBy(Employee::isActive)); The resulting map contains two lists, corresponding to whether or not the predicate was matched: List<Employee>

Javascript reduce on array of objects

一笑奈何 提交于 2019-11-27 02:33:31
Say I want to sum a.x for each element in arr . arr = [{x:1},{x:2},{x:4}] arr.reduce(function(a,b){return a.x + b.x}) >> NaN I have cause to believe that a.x is undefined at some point. The following works fine arr = [1,2,4] arr.reduce(function(a,b){return a + b}) >> 7 What am I doing wrong in the first example? After the first iteration your're returning a number and then trying to get property x of it to add to the next object which is undefined and maths involving undefined results in NaN . try returning an object contain an x property with the sum of the x properties of the parameters: var

Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?

依然范特西╮ 提交于 2019-11-27 01:55:47
问题 I think the title is already explaining my question. I would like to change key (tab space) value into key;value in all output files the reducers are generating from the output of mappers. I could not find good documentation on this using google. Can anyone please give a fraction of code on how to achieve this? 回答1: Set the configuration property mapred.textoutputformat.separator to ";" 回答2: In lack of better documentation, here's what I've collected: setTextOutputFormatSeparator(final Job

Spark groupByKey alternative

风流意气都作罢 提交于 2019-11-27 01:29:49
According to Databricks best practices, Spark groupByKey should be avoided as Spark groupByKey processing works in a way that the information will be first shuffled across workers and then the processing will occur. Explanation So, my question is, what are the alternatives for groupByKey in a way that it will return the following in a distributed and fast way? // want this {"key1": "1", "key1": "2", "key1": "3", "key2": "55", "key2": "66"} // to become this {"key1": ["1","2","3"], "key2": ["55","66"]} Seems to me that maybe aggregateByKey or glom could do it first in the partition ( map ) and

How to break on reduce method

别等时光非礼了梦想. 提交于 2019-11-27 01:23:06
How can I break the iteration on reduce method? for for (var i = Things.length - 1; i >= 0; i--) { if(Things[i] <= 0){ break; } }; reduce Things.reduce(function(memo, current){ if(current <= 0){ //break ??? //return; <-- this will return undefined to memo, which is not what I want } }, 0) UPDATE Some of the commentators make a good point that the original array is being mutated in order to break early inside the .reduce() logic. Therefore, I've modified the answer slightly by adding a .slice(0) before calling a follow-on .reduce() step. This is to preserve the original array by copying its

How does reduce function work?

五迷三道 提交于 2019-11-27 00:41:56
问题 As far as I understand, the reduce function takes a list l and a function f . Then, it calls the function f on first two elements of the list and then repeatedly calls the function f with the next list element and the previous result. So, I define the following functions: The following function computes the factorial. def fact(n): if n == 0 or n == 1: return 1 return fact(n-1) * n def reduce_func(x,y): return fact(x) * fact(y) lst = [1, 3, 1] print reduce(reduce_func, lst) Now, shouldn't this