gnu-parallel

GNU Parallel: split file into children

泪湿孤枕 提交于 2021-02-18 11:34:12
问题 Goal Use GNU Parallel to split a large .gz file into children. Since the server has 16 CPUs, create 16 children. Each child should contain, at most, N lines. Here, N = 104,214,420 lines. Children should be in .gz format. Input File name: file1.fastq.gz size: 39 GB line count: 1,667,430,708 (uncompressed) Hardware 36 GB Memory 16 CPUs HPCC environment (I'm not admin) Code Version 1 zcat "${input_file}" | parallel --pipe -N 104214420 --joblog split_log.txt --resume-failed "gzip > ${input_file}

GNU Parallel: split file into children

只愿长相守 提交于 2021-02-18 11:34:09
问题 Goal Use GNU Parallel to split a large .gz file into children. Since the server has 16 CPUs, create 16 children. Each child should contain, at most, N lines. Here, N = 104,214,420 lines. Children should be in .gz format. Input File name: file1.fastq.gz size: 39 GB line count: 1,667,430,708 (uncompressed) Hardware 36 GB Memory 16 CPUs HPCC environment (I'm not admin) Code Version 1 zcat "${input_file}" | parallel --pipe -N 104214420 --joblog split_log.txt --resume-failed "gzip > ${input_file}

Optimising my script which lookups into a big compressed file

六月ゝ 毕业季﹏ 提交于 2021-02-10 05:56:29
问题 I'm here again ! I would like to optimise my bash script in order to lower the time spent for each loop. Basically what it does is : getting an info from a tsv using that information to lookup with awk into a file printing the line and exporting it My issues are : 1) the files are 60GB compressed files : I need a software to uncompress it (I'm actually trying now to uncompress it, not sure I'll have enough space) 2) It is long to look into it anyway My ideas to improve it : 0) as said, if

How to run multiple curl requests in parallel with multiple variables

瘦欲@ 提交于 2021-02-05 08:44:38
问题 Set Up I currently have the below script working to download files with curl , using a ref file with multiple variables. When I created the script it suited my needs however as the ref file has gotten larger and the data I am requesting via curl is takes longer to generate, my script is now taking too much time to complete. Objective I want to be able to update this script so I have curl request and download multiple files as they are ready - as opposed to waiting for each file to be

How to run multiple curl requests in parallel with multiple variables

落花浮王杯 提交于 2021-02-05 08:44:26
问题 Set Up I currently have the below script working to download files with curl , using a ref file with multiple variables. When I created the script it suited my needs however as the ref file has gotten larger and the data I am requesting via curl is takes longer to generate, my script is now taking too much time to complete. Objective I want to be able to update this script so I have curl request and download multiple files as they are ready - as opposed to waiting for each file to be

Multiple read from a txt file in bash (parallel processing )

一世执手 提交于 2021-02-05 05:57:53
问题 Here is a simple bash script for HTTP status code while read url do urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "${url}" --max-time 5 ) echo "$url $urlstatus" >> urlstatus.txt done < $1 I am reading URL from text file but it processes only one at a time, taking too much time, GNU parallel and xargs also process one line at time (tested) How to process simultaneous URL for processing to improve timing? In other words threading of URL file rather than bash commands

GNU parallel: assign one thread for each node (directories and sub* directories) of an entire tree from a start directory

ε祈祈猫儿з 提交于 2020-08-27 14:59:33
问题 I would like to benefit from all the potential of parallel command on macOS (it seems there exists 2 versions, GNU and Ole Tange's version but I am not sure). With the following command: parallel -j8 find {} ::: * I will have a big performance if I am located in a directory containing 8 subdirectories. But if all these subdirectories have a small content except for only one, I will have only one thread which will work on the unique "big" directory. Is there a way to follow the parallelization

How can I use the parallel command to exploit multi-core parallelism on my MacBook?

最后都变了- 提交于 2020-08-09 13:56:13
问题 I often use the find command on Linux and macOS. I just discovered the command parallel , and I would like to combine it with find command if possible because find command takes a long time when we search a specific file into large directories. I have searched for this information but the results are not accurate enough. There appear to be a lot of possible syntaxes, but I can't tell which one is relevant. How do I combine the parallel command with the find command (or any other command) in

Gnu Parallel : nested parallelism

放肆的年华 提交于 2020-06-24 22:19:12
问题 Is it possible to call gnu parallel from within multiple runs of a script that are in-turn spawned by gnu parallel? I have a python script that runs for 100s of sequential iterations, and somewhere within each iteration, 4 values are being computed in parallel (using gnu parallel). Now I want to spawn multiple such scripts at the same time, again, using gnu parallel. Is this possible? Will gnu parallel take care of good utilization of available cores? For example, if in the inner loop, out of