gnu-parallel | 易学教程

GNU Parallel: split file into children

阅读更多关于 GNU Parallel: split file into children

问题 Goal Use GNU Parallel to split a large .gz file into children. Since the server has 16 CPUs, create 16 children. Each child should contain, at most, N lines. Here, N = 104,214,420 lines. Children should be in .gz format. Input File name: file1.fastq.gz size: 39 GB line count: 1,667,430,708 (uncompressed) Hardware 36 GB Memory 16 CPUs HPCC environment (I'm not admin) Code Version 1 zcat "${input_file}" | parallel --pipe -N 104214420 --joblog split_log.txt --resume-failed "gzip > ${input_file}

GNU Parallel: split file into children

阅读更多关于 GNU Parallel: split file into children

Optimising my script which lookups into a big compressed file

阅读更多关于 Optimising my script which lookups into a big compressed file

问题 I'm here again ! I would like to optimise my bash script in order to lower the time spent for each loop. Basically what it does is : getting an info from a tsv using that information to lookup with awk into a file printing the line and exporting it My issues are : 1) the files are 60GB compressed files : I need a software to uncompress it (I'm actually trying now to uncompress it, not sure I'll have enough space) 2) It is long to look into it anyway My ideas to improve it : 0) as said, if

How to run multiple curl requests in parallel with multiple variables

阅读更多关于 How to run multiple curl requests in parallel with multiple variables

问题 Set Up I currently have the below script working to download files with curl , using a ref file with multiple variables. When I created the script it suited my needs however as the ref file has gotten larger and the data I am requesting via curl is takes longer to generate, my script is now taking too much time to complete. Objective I want to be able to update this script so I have curl request and download multiple files as they are ready - as opposed to waiting for each file to be

How to run multiple curl requests in parallel with multiple variables

阅读更多关于 How to run multiple curl requests in parallel with multiple variables

Multiple read from a txt file in bash (parallel processing )

阅读更多关于 Multiple read from a txt file in bash (parallel processing )

问题 Here is a simple bash script for HTTP status code while read url do urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "${url}" --max-time 5 ) echo "$url $urlstatus" >> urlstatus.txt done < $1 I am reading URL from text file but it processes only one at a time, taking too much time, GNU parallel and xargs also process one line at time (tested) How to process simultaneous URL for processing to improve timing? In other words threading of URL file rather than bash commands

Modify gupdatedb (GNU updatedb command) to insert parallel command

阅读更多关于 Modify gupdatedb (GNU updatedb command) to insert parallel command

来源： https://stackoverflow.com/questions/63174961/modify-gupdatedb-gnu-updatedb-command-to-insert-parallel-command

GNU parallel: assign one thread for each node (directories and sub* directories) of an entire tree from a start directory

阅读更多关于 GNU parallel: assign one thread for each node (directories and sub* directories) of an entire tree from a start directory

问题 I would like to benefit from all the potential of parallel command on macOS (it seems there exists 2 versions, GNU and Ole Tange's version but I am not sure). With the following command: parallel -j8 find {} ::: * I will have a big performance if I am located in a directory containing 8 subdirectories. But if all these subdirectories have a small content except for only one, I will have only one thread which will work on the unique "big" directory. Is there a way to follow the parallelization

How can I use the parallel command to exploit multi-core parallelism on my MacBook?

阅读更多关于 How can I use the parallel command to exploit multi-core parallelism on my MacBook?

问题 I often use the find command on Linux and macOS. I just discovered the command parallel , and I would like to combine it with find command if possible because find command takes a long time when we search a specific file into large directories. I have searched for this information but the results are not accurate enough. There appear to be a lot of possible syntaxes, but I can't tell which one is relevant. How do I combine the parallel command with the find command (or any other command) in

Gnu Parallel : nested parallelism

阅读更多关于 Gnu Parallel : nested parallelism

问题 Is it possible to call gnu parallel from within multiple runs of a script that are in-turn spawned by gnu parallel? I have a python script that runs for 100s of sequential iterations, and somewhere within each iteration, 4 values are being computed in parallel (using gnu parallel). Now I want to spawn multiple such scripts at the same time, again, using gnu parallel. Is this possible? Will gnu parallel take care of good utilization of available cores? For example, if in the inner loop, out of