parallel-processing

.s.GNU parallel, execution of commands from file few at a time

一笑奈何 提交于 2020-02-05 13:04:07
问题 I have a file called macse.cmd which contains 1000 commands to execute, 1 command per line. I want to use parallel to execute 30 at a time. I don't care in what order they are executed as long as all are. I tried "parallel -j 30 ./macse.cmd" but this caused them to run 1 by 1 and I am not even sure how to stop them. Adrian p.s. Commands look like: java -jar -Xmx5000m ~/programs/macse_v1.01b.jar -prog alignSequences -seq M715_2100035271/all_unaligned.fasta -out_NT M715_2100035271/aligned_nt

Is an implicit execution context passed down to .par operations?

心已入冬 提交于 2020-02-05 04:58:34
问题 I have this situation: method a: an implicit ec is created method a: calls another method in a Future, i.e Future(anotherMethod) . anotherMethod , and all its subsequent calls no longer have the ec from method a in scope. Example code: class Foo { private implicit val ec: ExecutionContextExecutor = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(Runtime.getRuntime.availableProcessors())) private val anotherClass = new Bar() def methodA() = Future(anotherClass.anotherMethod()) } I'm

Parallel processing in R

爷,独闯天下 提交于 2020-02-05 01:22:32
问题 I have a code in R to perform classification and estimation( using regression modeling) on 60 data sets using random forest algorithm and at the end of it there is a plot to show how a quantity evolves with time. I am performing leave one out procedure on the same and since it takes a long time, I have used parallel processing using the doSnow package. I am able to see that the code does work properly (I am storing the output of my cat commands in a separate log file). However, when I open

Get statistics for a list of numbers using GPU

心不动则不痛 提交于 2020-02-04 09:25:06
问题 I have several lists of numbers on a file . For example, .333, .324, .123 , .543, .00054 .2243, .333, .53343 , .4434 Now, I want to get the number of times each number occurs using the GPU. I believe this will be faster to do on the GPU than the CPU because each thread can process one list. What data structure should I use on the GPU to easily get the above counts. For example , for the above, the answer will look as follows: .333 = 2 times in entire file .324 = 1 time etc.. I looking for a

Parallelize a nested for loop in python for finding the max value

二次信任 提交于 2020-02-04 05:48:45
问题 I'm struggling for some time to improve the execution time of this piece of code. Since the calculations are really time-consuming I think that the best solution would be to parallelize the code. The output could be also stored in memory, and written to a file afterwards. I am new to both Python and parallelism, so I find it difficult to apply the concepts explained here and here. I also found this question, but I couldn't manage to figure out how to implement the same for my situation. I am

Running airflow tasks/dags in parallel

拟墨画扇 提交于 2020-02-03 04:05:49
问题 I'm using airflow to orchestrate some python scripts. I have a "main" dag from which several subdags are run. My main dag is supposed to run according to the following overview: I've managed to get to this structure in my main dag by using the following lines: etl_internal_sub_dag1 >> etl_internal_sub_dag2 >> etl_internal_sub_dag3 etl_internal_sub_dag3 >> etl_adzuna_sub_dag etl_internal_sub_dag3 >> etl_adwords_sub_dag etl_internal_sub_dag3 >> etl_facebook_sub_dag etl_internal_sub_dag3 >> etl

Running airflow tasks/dags in parallel

前提是你 提交于 2020-02-03 04:04:57
问题 I'm using airflow to orchestrate some python scripts. I have a "main" dag from which several subdags are run. My main dag is supposed to run according to the following overview: I've managed to get to this structure in my main dag by using the following lines: etl_internal_sub_dag1 >> etl_internal_sub_dag2 >> etl_internal_sub_dag3 etl_internal_sub_dag3 >> etl_adzuna_sub_dag etl_internal_sub_dag3 >> etl_adwords_sub_dag etl_internal_sub_dag3 >> etl_facebook_sub_dag etl_internal_sub_dag3 >> etl

Running airflow tasks/dags in parallel

白昼怎懂夜的黑 提交于 2020-02-03 04:04:24
问题 I'm using airflow to orchestrate some python scripts. I have a "main" dag from which several subdags are run. My main dag is supposed to run according to the following overview: I've managed to get to this structure in my main dag by using the following lines: etl_internal_sub_dag1 >> etl_internal_sub_dag2 >> etl_internal_sub_dag3 etl_internal_sub_dag3 >> etl_adzuna_sub_dag etl_internal_sub_dag3 >> etl_adwords_sub_dag etl_internal_sub_dag3 >> etl_facebook_sub_dag etl_internal_sub_dag3 >> etl

Python multiprocessing pool.map doesn't work parallel

本小妞迷上赌 提交于 2020-02-03 02:14:11
问题 I wrote a simple parallel python program import multiprocessing as mp import time def test_function(i): print("function starts" + str(i)) time.sleep(1) print("function ends" + str(i)) if __name__ == '__main__': pool = mp.Pool(mp.cpu_count()) pool.map(test_function, [i for i in range(4)]) pool.close() pool.join() What I expect to see in the output: function starts0 function starts2 function starts1 function starts3 function ends1 function ends3 function ends2 function ends0 What I actually see

MySQL select request in parallel (python)

三世轮回 提交于 2020-02-03 02:08:32
问题 I saw a "similar" post Executing MySQL SELECT * query in parallel, buy my question is different, and this has not been answered either, so i guess its not a duplicate. I am trying to do a MySQL select request in parallel. The reason is because i need the response fast. I managed to create the request when i paralleled the connection as well, but as the connection takes more time then the actual select it would be faster to connect one time, and do the select in parallel. My approach: import