chunks

Spring Batch custom completion policy for dynamic chunk size

﹥>﹥吖頭↗ 提交于 2019-12-02 01:58:07
Context We have a batch job that replicates localized country names (i.e. translations of country names to different languages) to our DB from the external one. The idea was to process all localized country names for a single country in 1 chunk (i.e. first chunk - all translations for Andorra, next chunk - all translations for U.A.E., etc.). We use JdbcCursorItemReader for reading external data + some oracle analytic functions to provide total number of translations available for the country: something like select country_code, language_code, localized_name, COUNT(1) OVER(PARTITION BY c_lng

Indexing sequence chunks using data.table

好久不见. 提交于 2019-12-02 00:41:43
Say I have a data set where sequences of length 1 are illegal, length 2 are legal, greater than length 5 are illegal but it is allowed to break longer sequences up into <=5 sequences. set.seed(1) DT1 <- data.table(smp = 1, R=sample(0:1, 20000, rep=TRUE), Seq = 0L) DT1[, smp:=1:length(smp)] DT1[, Seq:=seq(.N), by=list(cumsum(c(0, abs(diff(R)))))] This last line comes directly from: Creating a sequence in a data.table depending on a column DT1[, fix_min:=ifelse((R==TRUE & Seq==1) | (R==FALSE), FALSE, TRUE)] fixmin_idx2 <- which(DT1[, fix_min==TRUE]) DT1[fixmin_idx2 -1, fix_min:=TRUE] Now my

How do I process a text file in C by chunks of lines?

一个人想着一个人 提交于 2019-12-02 00:27:55
I'm writing a program in C that processes a text file and keeps track of each unique word (by using a struct that has a char array for the word and a count for its number of occurrences) and stores this struct into a data structure. However, the assignment has this included: "The entire txt file may be very large and not able to be held in the main memory. Account for this in your program." I asked him after class, and he said to read the text file by X lines at a time (I think 20,000 was his suggestion?) at a time, analyze them and update the structs, until you've reached the end of the file.

Move elements using array_chunk with PHP

时光毁灭记忆、已成空白 提交于 2019-12-01 21:57:13
问题 I have a basic array in which I am using array_chunk to divide it into 3 elements. $array = array( 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h' ); $chunk = array_chunk($array, 3); The result is as follows [ [ "a", "b", "c" ], [ "d", "e", "f" ], [ "g", "h" ] ] (Las chunk have 2 elements) In case the last "chunk" has only 2 elements, how can I move down an element of the first chunk so that the first element has 2? It should look like this: [ [ "a", "b", ], [ "c" "d", "e", ], [ "f" "g", "h ] ] (First

Move elements using array_chunk with PHP

一笑奈何 提交于 2019-12-01 21:20:07
I have a basic array in which I am using array_chunk to divide it into 3 elements. $array = array( 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h' ); $chunk = array_chunk($array, 3); The result is as follows [ [ "a", "b", "c" ], [ "d", "e", "f" ], [ "g", "h" ] ] (Las chunk have 2 elements) In case the last "chunk" has only 2 elements, how can I move down an element of the first chunk so that the first element has 2? It should look like this: [ [ "a", "b", ], [ "c" "d", "e", ], [ "f" "g", "h ] ] (First chunk have 2 elements) The easiest way are mutliple executions of erray_reverse: first reverse the

Java - Read text file by chunks

流过昼夜 提交于 2019-11-30 16:08:11
I want to read a log file in different chunks to make it multi threaded. The application is going to run in a serverside environment with multiple hard disks. After reading into chunks the app is going to process line per line of every chunk. I've accomplished the reading of every file line line with a bufferedreader and I can make chunks of my file with RandomAccessFile in combination with MappedByteBuffer, but combining these two isn't easy. The problem is that the chunk is just cutting into the last line of my chunk. I never have the whole last line of my block so processing this last log

Return all possible combinations of a string when splitted into n strings

筅森魡賤 提交于 2019-11-30 15:57:16
问题 I made a search for stackoverflow about this but couldn't find a way to do it. It probably involves itertools. I want to find all the possible results of splitting a string, say the string thisisateststring into n (equal or unequal length, doesn't matter, both should be included) strings. For example let n be 3 : [["thisisat", "eststrin", "g"], ["th", "isisates", "tstring"], ............] 回答1: Including empty strings in your results will be rather awkward with itertools.combinations() . It's

Return all possible combinations of a string when splitted into n strings

大憨熊 提交于 2019-11-30 15:25:58
I made a search for stackoverflow about this but couldn't find a way to do it. It probably involves itertools. I want to find all the possible results of splitting a string, say the string thisisateststring into n (equal or unequal length, doesn't matter, both should be included) strings. For example let n be 3 : [["thisisat", "eststrin", "g"], ["th", "isisates", "tstring"], ............] Including empty strings in your results will be rather awkward with itertools.combinations() . It's probably easiest to write your own recursive version: def partitions(s, k): if not k: yield [s] return for i

out of memory error when reading csv file in chunk

眉间皱痕 提交于 2019-11-30 13:59:22
I am processing a csv -file which is 2.5 GB big. The 2.5 GB table looks like this: columns=[ka,kb_1,kb_2,timeofEvent,timeInterval] 0:'3M' '2345' '2345' '2014-10-5',3000 1:'3M' '2958' '2152' '2015-3-22',5000 2:'GE' '2183' '2183' '2012-12-31',515 3:'3M' '2958' '2958' '2015-3-10',395 4:'GE' '2183' '2285' '2015-4-19',1925 5:'GE' '2598' '2598' '2015-3-17',1915 And I want to groupby ka and kb_1 to get the result like this: columns=[ka,kb,errorNum,errorRate,totalNum of records] '3M','2345',0,0%,1 '3M','2958',1,50%,2 'GE','2183',1,50%,2 'GE','2598',0,0%,1 (definition of error Record: when kb_1 != kb_2

Paging python lists in slices of 4 items [duplicate]

佐手、 提交于 2019-11-30 08:46:37
Possible Duplicate: How do you split a list into evenly sized chunks in Python? mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9] I need to pass blocks of these to a third party API that can only deal with 4 items at a time. I could do one at a time but it's a HTTP request and process for each go so I'd prefer to do it in the lowest possible number of queries. What I'd like to do is chunk the list into blocks of four and submit each sub-block. So from the above list, I'd expect: [[1, 2, 3, 4], [5, 6, 7, 8], [9]] mylist = [1, 2, 3, 4, 5, 6, 7, 8, 9] print [mylist[i:i+4] for i in range(0, len(mylist), 4)] #