hpc

Long vectors stringdist package R

旧时模样 提交于 2021-02-11 05:59:27
问题 I posted a question some days ago and while the solution seems to be working on RStudio in Windows (but takes forever and sometimes spits out no results), I keep getting an error of long vectors not supported when I run the same code with 30 CPUs on a HPC. Any ideas why? Here is a sample of the data: > head(forfuzzy) # A tibble: 6 x 3 grantee_name grantee_city grantee_state <chr> <chr> <chr> 1 (ICS)2 MAINE CHAPTER CLEARWATER FL 2 (SUFFOLK COUNTY) VANDERBILT~ CENTERPORT NY 3 1 VOICE TREKKING A

Using parallel NetCDF to save a distributed 3D complex array

北城余情 提交于 2021-02-08 03:46:28
问题 I have an MPI-based program written in Fortran which produces a 3D array of complex data at each node (sections of a 2D time-series). I would like to use parallel I/O to write these arrays to a single file which can be relatively easily opened in python for further analysis/visualization. Ideally I would like the solution to be memory efficient (i.e. avoid the creation of intermediate temporary arrays). Using NetCDF, I have managed to adapt a subroutine which achieves this for a 3D array of

Which AVX and march should be specified on a cluster with different architectures?

北慕城南 提交于 2021-02-07 14:40:39
问题 I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138). As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute

Which AVX and march should be specified on a cluster with different architectures?

不打扰是莪最后的温柔 提交于 2021-02-07 14:40:33
问题 I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138). As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute

Which AVX and march should be specified on a cluster with different architectures?

谁说胖子不能爱 提交于 2021-02-07 14:40:22
问题 I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138). As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute

Which AVX and march should be specified on a cluster with different architectures?

守給你的承諾、 提交于 2021-02-07 14:39:13
问题 I'm currently trying to compile software for the use on a HPC-Cluster using Intel compilers. The login-node, which is where I compile and prepare the computations uses Intel Xeon Gold 6148 Processors, while the compute nodes use either Haswell- (Intel Xeon E5-2660 v3 / Intel Xeon Processor E5-2680 v3) or Skylake-processors (Intel Xeon Gold 6138). As far as I understand from the links above, my login-node supports Intel SSE4.2, Intel AVX, Intel AVX2, as well as Intel AVX-512 but my compute

Make use of all CPUs on SLURM

空扰寡人 提交于 2021-01-27 19:52:00
问题 I would like to run a job on the cluster. There are a different number of CPUs on different nodes and I have no idea which nodes will be assigned to me. What are the proper options so that the job can create as many tasks as CPUs on all nodes? #!/bin/bash -l #SBATCH -p normal #SBATCH -N 4 #SBATCH -t 96:00:00 srun -n 128 ./run 回答1: One dirty hack to achieve the objective is using the environment variables provided by the SLURM. For a sample sbatch file: #!/bin/bash #SBATCH --job-name=test

Best block size value for block matrix matrix multiplication

为君一笑 提交于 2020-12-30 01:38:32
问题 I want to do block matrix-matrix multiplication with the following C code.In this approach, blocks of size BLOCK_SIZE is loaded into the fastest cache in order to reduce memory traffic during calculation. void bMMikj(double **A , double **B , double ** C , int m, int n , int p , int BLOCK_SIZE){ int i, j , jj, k , kk ; register double jjTempMin = 0.0 , kkTempMin = 0.0; for (jj=0; jj<n; jj+= BLOCK_SIZE) { jjTempMin = min(jj+ BLOCK_SIZE,n); for (kk=0; kk<n; kk+= BLOCK_SIZE) { kkTempMin = min(kk

Best block size value for block matrix matrix multiplication

痴心易碎 提交于 2020-12-30 01:33:03
问题 I want to do block matrix-matrix multiplication with the following C code.In this approach, blocks of size BLOCK_SIZE is loaded into the fastest cache in order to reduce memory traffic during calculation. void bMMikj(double **A , double **B , double ** C , int m, int n , int p , int BLOCK_SIZE){ int i, j , jj, k , kk ; register double jjTempMin = 0.0 , kkTempMin = 0.0; for (jj=0; jj<n; jj+= BLOCK_SIZE) { jjTempMin = min(jj+ BLOCK_SIZE,n); for (kk=0; kk<n; kk+= BLOCK_SIZE) { kkTempMin = min(kk

How do I optimize the parallelization of Monte Carlo data generation with MPI?

混江龙づ霸主 提交于 2020-08-10 20:16:36
问题 I am currently building a Monte Carlo application in C++ and I have a question regarding parallelization with MPI. The process I want to parallelize is the MC generation of data. To have good precision in my final results, I specify the goal number of data points. Each data point is generated independently, but might require vastly differing amounts of time. How do I organize the parallelization and workload distribution of the data generation most efficiently? What I have done so far So far