问题
I have a python
script that I am running on a SLURM
cluster for multiple input files:
#!/bin/bash
#SBATCH -p standard
#SBATCH -A overall
#SBATCH --time=12:00:00
#SBATCH --output=normalize_%A.out
#SBATCH --error=normalize_%A.err
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=240000
HDF5_DIR=...
OUTPUT_DIR=...
NORM_SCRIPT=...
norm_func () {
local file=$1
echo "$file"
python $NORM_SCRIPT -data $file -path $OUTPUT_DIR
}
# Doing normalization in parallel
for file in $HDF5_DIR/*; do norm_func "$file" & done
wait
The python script is just loading a dataset (scRNAseq
), does its normalization and saves as .csv
file. Some major lines of code in it are:
f = h5py.File(path_to_file, 'r')
rawcounts = np.array(rawcounts)
unique_code = np.unique(split_code)
for code in unique_code:
mask = np.equal(split_code, code)
curr_counts = rawcounts[:,mask]
# Actual TMM normalization
mtx_norm = gmn.tmm_normalization(curr_counts)
# Writing the results into .csv file
csv_path = path_to_save + "/" + file_name + "_" + str(code) + ".csv"
with open(csv_path,'w', encoding='utf8') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(["", cell_ids])
for idx, row in enumerate(mtx_norm):
writer.writerow([gene_symbols[idx], row])
I keep getting step memory exceeded
error for datasets that are above 10Gb
and I am not sure why. How I can change my .slurm
script or python
code to reduce its memory usage? How can I actually identify what causes the memory
problem, is there a particular way of debugging the memory in this case? Any suggestions would be greatly appreciated.
回答1:
You can get refined information by using srun
to start the python scripts:
srun python $NORM_SCRIPT -data $file -path $OUTPUT_DIR
Slurm will then create one 'step' per instance of your python script, and report information (errors, return codes, memory used, etc.) for each step independently in the accounting, which you can interrogate with the sacct
command.
If configured by the administrators, use the --profile
option to get a timeline of the memory usage of each step.
In your python script you can use the memory_profile module to get a feedback on the memory usage of your scripts.
来源:https://stackoverflow.com/questions/52229942/how-to-determine-at-which-point-in-python-script-step-memory-exceeded-in-slurm