User Tools

Site Tools


wiki:anunna_setting

Geocomputation at High Performance Computing Cluster (HPC) Anunna

You can log in to Anunna with the following line and by changing the user_name with the login name that the Anunna's system admin sent you. After, a prompt terminal will request to insert the password that the Anunna's system admin sent you.

 ssh -X  -Y user_name@login.anunna.wur.nl

Setting up your home and sw for Geocomputation analisys

cd $HOME
# create a folder for your scripts 
mkdir $HOME/scripts
cd $HOME/scripts
wget http://www.spatial-ecology.net/ost4sem/exercise/hpc/anunna_setting.sh
wget http://www.spatial-ecology.net/ost4sem/exercise/hpc/sc01_split_tif.sh
wget http://www.spatial-ecology.net/ost4sem/exercise/hpc/sc02a_filter_tif_forloop.sh
wget http://www.spatial-ecology.net/ost4sem/exercise/hpc/sc02b_filter_tif_xargs.sh
wget http://www.spatial-ecology.net/ost4sem/exercise/hpc/sc02c_filter_tif_njobs.sh
wget http://www.spatial-ecology.net/ost4sem/exercise/hpc/sc02d_filter_tif_arrayjobs.sh
sed -i -e "s/insert_your_user/$USER/g" *  

Available storage in anunna
/lustre/scratch/GUESTS/$USER : regularly cleaned up (files >1 month old will be removed)
/lustre/nobackup/GUESTS/$USER : extra cost
/lustre/backup/GUESTS/$USER : will be backed up (for extra cost!)

Run anunna_setting.sh to copy data, create directories and copy bash setting.
bash $HOME/scripts/anunna_setting.sh  
anunna_setting.sh
# create folders for standard error and standard output 
mkdir /lustre/scratch/GUESTS/$USER/stderr
mkdir /lustre/scratch/GUESTS/$USER/stdout 
 
# create soft link to scratch 
ln -s /lustre/scratch/GUESTS/$USER/  $HOME/scratch30
 
# copy data 
cp  -r /tmp/ost4sem $HOME/
 
# copy visualization tool 
 
mkdir $HOME/bin
cp -r /tmp/bin  $HOME
 
# copy bash setting
cp  /tmp/.bashrc_GUESTS  $HOME/.bashrc
 
# load the new bash setting
source $HOME/.bashrc

At this point your home should be configured to run Gecomputation procedures. The Geocomputation software (GRASS PKTOOLS and GDAL) are loaded directly in your bashrc. You can read it by “more $HOME/.bashrc”

Filter an image

Status of the jobs in slurm can be seen by:

squeue --all
sacct
sinfo

building up some specific alias and save to $HOME/.bashrc

alias myq='squeue -u $USER   -o "%.9F %.10K %.4P %.80j %3D%2C%.8T %.9M  %.9l  %.S  %R"'

Prepare raster dataset

A portion of a Landsat image will be divided in 4 vrt tiles each one containing 3 bands. The vrt will be used in the following scripting procedures.

sbatch /home/GUESTS/$USER/scripts/sc01_split_tif.sh
sc01_split_tif.sh
#!/bin/bash
#SBATCH -p GUESTS_Low
#SBATCH -J sc01_split_tif.sh
#SBATCH -n 1 -c 1 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /lustre/scratch/GUESTS/insert_your_user/stdout/sc01_split_tif.sh.%J.out
#SBATCH -e /lustre/scratch/GUESTS/insert_your_user/stderr/sc01_split_tif.sh.%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=500M
 
#### sbatch /home/GUESTS/$USER/scripts/sc01_split_tif.sh
 
 
DIR=/home/GUESTS/$USER/ost4sem/exercise/KenyaGIS/Landsat
 
gdalbuildvrt -overwrite -separate -te 36.5 -1.5 37 -1 $DIR/stack_UL.vrt $DIR/LT51680612010231MLK00_B1_proj.tif $DIR/LT51680612010231MLK00_B2_proj.tif $DIR/LT51680612010231MLK00_B3_proj.tif
gdalbuildvrt -overwrite -separate -te 36.5 -2 37 -1.5 $DIR/stack_LL.vrt $DIR/LT51680612010231MLK00_B1_proj.tif $DIR/LT51680612010231MLK00_B2_proj.tif $DIR/LT51680612010231MLK00_B3_proj.tif
 
gdalbuildvrt -overwrite -separate -te 37 -1.5 37.5 -1 $DIR/stack_UR.vrt $DIR/LT51680612010231MLK00_B1_proj.tif $DIR/LT51680612010231MLK00_B2_proj.tif $DIR/LT51680612010231MLK00_B3_proj.tif
gdalbuildvrt -overwrite -separate -te 37 -2 37.5 -1.5 $DIR/stack_LR.vrt $DIR/LT51680612010231MLK00_B1_proj.tif $DIR/LT51680612010231MLK00_B2_proj.tif $DIR/LT51680612010231MLK00_B3_proj.tif

sc02a Proces 4 tiles in one node using a cpu with the bash for loop

This is the easiest procedure to perform a geocomputation operation. Lunch a job that use a normal for loop to iterate on the 4 tiles. After the iterations (pkfilter) the for tiles can be re-merged by gdalbuildvrt and gdal_translate.

sbatch /home/GUESTS/$USER/scripts/sc02a_filter_tif_forloop.sh
sc02a_filter_tif_forloop.sh
#!/bin/bash
#SBATCH -p GUESTS_Low
#SBATCH -J sc02a_filter_tif_forloop.sh
#SBATCH -n 1 -c 1 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /lustre/scratch/GUESTS/insert_your_user/stdout/sc02a_filter_tif_forloop.sh.%J.out
#SBATCH -e /lustre/scratch/GUESTS/insert_your_user/stderr/sc02a_filter_tif_forloop.sh.%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=500
 
#### sbatch /home/GUESTS/$USER/scripts/sc02a_filter_tif_forloop.sh
 
 
DIR=/home/GUESTS/$USER/ost4sem/exercise/KenyaGIS/Landsat
 
echo filter the stack_??.vrt files 
 
for file in $DIR/stack_??.vrt  ; do 
filename=$(basename $file .vrt)
pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/$filename.tif 
done 
 
echo  re-create the large tif 
 
gdalbuildvrt -overwrite $DIR/stack.vrt   $DIR/stack_UL.tif  $DIR/stack_LL.tif    $DIR/stack_UR.tif   $DIR/stack_LR.tif  
gdal_translate -co COMPRESS=DEFLATE -co ZLEVEL=9  $DIR/stack.vrt $DIR/stack_filter.tif 
rm $DIR/stack_UL.tif  $DIR/stack_LL.tif    $DIR/stack_UR.tif   $DIR/stack_LR.tif $DIR/stack.vrt

sc02b Multi-process inside one node using 4 cpu using xargs

This is one of the most efficient ways to perform a geocomputation operation. Lunch a job that use xargs to compute the iterations in a multicore (4 cpu in this case). After the iterations (pkfilter) the 4 tiles can be re-merged by gdalbuildvrt and gdal_translate. The use of xargs allows to constrains all the iterations in one node using different cpus. The advantage is that after xargs all the tiles will be ready to be merged back. A disadvantage can be that in case you are requesting many cpu (e.g. 24) you have to wait that one node will have 24 cpu free. A good compromise can be just requested 8-12 cpu and add more time to the wall time (-t)

sbatch /home/GUESTS/$USER/scripts/sc02b_filter_tif_xargs.sh
sc02b_filter_tif_xargs.sh
#!/bin/bash
#SBATCH -p GUESTS_Low
#SBATCH -J sc02b_filter_tif_xargs.sh
#SBATCH -n 1 -c 4 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /lustre/scratch/GUESTS/insert_your_user/stdout/sc02b_filter_tif_xargs.sh.%J.out
#SBATCH -e /lustre/scratch/GUESTS/insert_your_user/stderr/sc02b_filter_tif_xargs.sh.%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=500
 
#### sbatch /home/GUESTS/$USER/scripts/sc02b_filter_tif_xargs.sh
 
 
export DIR=/home/GUESTS/$USER/ost4sem/exercise/KenyaGIS/Landsat
 
echo start the multicore computation
 
ls $DIR/stack_??.vrt | xargs -n 1 -P 4 bash -c $'  
file=$1
filename=$(basename $file .vrt)
pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/$filename.tif 
' _ 
 
echo  re-create the large tif 
 
gdalbuildvrt -overwrite $DIR/stack.vrt   $DIR/stack_UL.tif  $DIR/stack_LL.tif    $DIR/stack_UR.tif   $DIR/stack_LR.tif  
gdal_translate -co COMPRESS=DEFLATE -co ZLEVEL=9  $DIR/stack.vrt $DIR/stack_filter.tif 
rm $DIR/stack_UL.tif  $DIR/stack_LL.tif    $DIR/stack_UR.tif   $DIR/stack_LR.tif $DIR/stack.vrt

sc02c Proces 4 tiles with 4 indepent jobs - one node one cpu

This is a good way to run 4 independent jobs. Each job can perform one iteration. This option is good if need to lunch 100-200 jobs. You can also think that inside each job you can nest a xargs operation. The disadvantage is that each script will finish independently from the other so the only way to re-merge the tif is wait that all the jobs are finished.

for file in /home/GUESTS/$USER/ost4sem/exercise/KenyaGIS/Landsat/stack_??.vrt 
do sbatch --export=file=$file  /home/GUESTS/$USER/scripts/sc02c_filter_tif_njobs.sh 
done 
sc02c_filter_tif_njobs.sh
#!/bin/bash
#SBATCH -p GUESTS_Low
#SBATCH -J sc02c_filter_tif_njobs.sh
#SBATCH -n 1 -c 1 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /lustre/scratch/GUESTS/insert_your_user/stdout/sc02c_filter_tif_njobs.sh.%J.out
#SBATCH -e /lustre/scratch/GUESTS/insert_your_user/stderr/sc02c_filter_tif_njobs.sh.%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=500
 
#### for file in /home/GUESTS/$USER/ost4sem/exercise/KenyaGIS/Landsat/stack_??.vrt  ; do sbatch --export=file=$file  /home/GUESTS/$USER/scripts/sc02c_filter_tif_njobs.sh ; done 
 
 
DIR=/home/GUESTS/$USER/ost4sem/exercise/KenyaGIS/Landsat
 
echo filter the $file file 
 
filename=$(basename $file .vrt)
pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/$filename.tif 
 
echo  re-create the large tif by another script

sc02c Proces 4 tiles with 1 job lunching 4-array-job - one node one cpu

This is a good way to run 4 independent jobs-array. Each job-array can perform one iteration. This option is good if need to lunch many many computations (e.g. 1000-2000). You can also think that inside each job you can nest a xargs operation. The disadvantage is that each script will finish independently from the other so the only way to re-merge the tif is wait that all the jobs are finished.

sbatch /home/GUESTS/$USER/scripts/sc02d_filter_tif_arrayjobs.sh
sc02d_filter_tif_arrayjobs.sh
#!/bin/bash
#SBATCH -p GUESTS_Low
#SBATCH -J sc02d_filter_tif_arrayjobs.sh
#SBATCH -n 1 -c 1 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /lustre/scratch/GUESTS/insert_your_user/stdout/sc02d_filter_tif_arrayjobs.sh.%A_%a.out 
#SBATCH -e /lustre/scratch/GUESTS/insert_your_user/stderr/sc02d_filter_tif_arrayjobs.sh.%A_%a.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=500
#SBATCH --array=1-4
 
#### sbatch /home/GUESTS/$USER/scripts/sc02d_filter_tif_arrayjobs.sh
 
 
DIR=/home/GUESTS/$USER/ost4sem/exercise/KenyaGIS/Landsat
 
file=$(ls $DIR/stack_??.vrt  | head  -n  $SLURM_ARRAY_TASK_ID | tail  -1 )
 
filename=$(basename $file .vrt)
pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/$filename.tif 
wiki/anunna_setting.txt · Last modified: 2019/07/14 08:48 by ost4sem_giu