问题
I have a python script that I am running on a SLURM cluster for multiple input files:
#!/bin/bash
#SBATCH -p standard
#SBATCH -A overall
#SBATCH --time=12:00:00
#SBATCH --output=normalize_%A.out
#SBATCH --error=normalize_%A.err
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=240000
HDF5_DIR=...
OUTPUT_DIR=...
NORM_SCRIPT=...
norm_func () {
local file=$1
echo "$file"
python $NORM_SCRIPT -data $file -path $OUTPUT_DIR
}
# Doing normalization in parallel
for file in $HDF5_DIR/*; do norm_func "$file" & done
wait
The python script is just loading a dataset (scRNAseq), does its normalization and saves as .csv file. Some major lines of code in it are:
f = h5py.File(path_to_file, 'r')
rawcounts = np.array(rawcounts)
unique_code = np.unique(split_code)
for code in unique_code:
mask = np.equal(split_code, code)
curr_counts = rawcounts[:,mask]
# Actual TMM normalization
mtx_norm = gmn.tmm_normalization(curr_counts)
# Writing the results into .csv file
csv_path = path_to_save + "/" + file_name + "_" + str(code) + ".csv"
with open(csv_path,'w', encoding='utf8') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(["", cell_ids])
for idx, row in enumerate(mtx_norm):
writer.writerow([gene_symbols[idx], row])
I keep getting step memory exceeded error for datasets that are above 10Gb and I am not sure why. How I can change my .slurm script or python code to reduce its memory usage? How can I actually identify what causes the memory problem, is there a particular way of debugging the memory in this case? Any suggestions would be greatly appreciated.
回答1:
You can get refined information by using srun to start the python scripts:
srun python $NORM_SCRIPT -data $file -path $OUTPUT_DIR
Slurm will then create one 'step' per instance of your python script, and report information (errors, return codes, memory used, etc.) for each step independently in the accounting, which you can interrogate with the sacct command.
If configured by the administrators, use the --profile option to get a timeline of the memory usage of each step.
In your python script you can use the memory_profile module to get a feedback on the memory usage of your scripts.
来源:https://stackoverflow.com/questions/52229942/how-to-determine-at-which-point-in-python-script-step-memory-exceeded-in-slurm