User Tools

Site Tools


wiki:anunna_setting

Geocomputation at High Performance Computing Cluster (HPC) Grace

Filter an image

Status of the jobs in slurm can be seen by:

squeue --all
sacct
sinfo

building up some specific alias and save to $HOME/.bashrc

alias myq='squeue -u $USER   -o "%.9F %.10K %.4P %.80j %3D%2C%.8T %.9M  %.9l  %.S  %R"'
alias err='ll -rt    /gpfs/scratch60/fas/sbsc/$USER/stderr/*'
alias errl='ll -rt    /gpfs/scratch60/fas/sbsc/$USER/stderr/* | tail '
alias errlless=' less  $(ls  -rt    /gpfs/scratch60/fas/sbsc/$USER/stderr/* | tail -1 ) '
alias errlmore=' more  $(ls  -rt    /gpfs/scratch60/fas/sbsc/$USER/stderr/* | tail -1 ) '
alias out='ll -rt    /gpfs/scratch60/fas/sbsc/$USER/stdout/*'
alias outl='ll -rt    /gpfs/scratch60/fas/sbsc/$USER/stdout/* | tail '
alias outlless=' less  $(ls  -rt    /gpfs/scratch60/fas/sbsc/$USER/stdout/* | tail -1 ) '
alias outlmore=' more  $(ls  -rt    /gpfs/scratch60/fas/sbsc/$USER/stdout/* | tail -1 ) '

Prepare raster dataset

A portion of a Landsat image

 wget http://www.spatial-ecology.net/ost4sem/exercise/KenyaGIS/Landsat/LT51680612010231MLK00_B123_proj.tar.gz
 tar -xvzf LT51680612010231MLK00_B123_proj.tar.gz
 ### download the scripts 
 wget -N -O sc01_split_tif.sh "http://www.spatial-ecology.net/dokuwiki/doku.php?do=export_code&id=wiki:anunna_setting&codeblock=0"
 wget -N -O sc02a_filter_tif_forloop.sh "http://www.spatial-ecology.net/dokuwiki/doku.php?do=export_code&id=wiki:anunna_setting&codeblock=1"
 wget -N -O sc02b_filter_tif_xargs.sh "http://www.spatial-ecology.net/dokuwiki/doku.php?do=export_code&id=wiki:anunna_setting&codeblock=2"
 wget -N -O sc02c_filter_tif_njobs.sh "http://www.spatial-ecology.net/dokuwiki/doku.php?do=export_code&id=wiki:anunna_setting&codeblock=3"
 wget -N -O sc02d_filter_tif_arrayjobs.sh "http://www.spatial-ecology.net/dokuwiki/doku.php?do=export_code&id=wiki:anunna_setting&codeblock=4"
 

will be divided in 4 vrt tiles each one containing 3 bands. The vrt will be used in the following scripting procedures.

sbatch /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc01_split_tif.sh
sc01_split_tif.sh
#!/bin/bash
#SBATCH -p day
#SBATCH -J sc01_split_tif.sh
#SBATCH -n 1 -c 1 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /gpfs/scratch60/fas/sbsc/ga254/stdout/sc01_split_tif.sh.%J.out
#SBATCH -e /gpfs/scratch60/fas/sbsc/ga254/stderr/sc01_split_tif.sh.%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=4G
 
####### sbatch /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc01_split_tif.sh
 
module load GEOS/3.6.2-foss-2018a-Python-2.7.14 
module load GDAL/2.2.3-foss-2018a-Python-2.7.14
module load GSL/2.3-GCCcore-6.4.0
module load Boost/1.66.0-foss-2018a
module load PKTOOLS/2.6.7.6-foss-2018a 
module load Armadillo/8.400.0-foss-2018a
 
DIR=/gpfs/loomis/home.grace/$USER/geocomputation/Landsat
 
gdalbuildvrt -overwrite -separate -te 36.5 -1.5 37 -1 $DIR/stack_UL.vrt $DIR/LT51680612010231MLK00_B1_proj.tif $DIR/LT51680612010231MLK00_B2_proj.tif $DIR/LT51680612010231MLK00_B3_proj.tif
gdalbuildvrt -overwrite -separate -te 36.5 -2 37 -1.5 $DIR/stack_LL.vrt $DIR/LT51680612010231MLK00_B1_proj.tif $DIR/LT51680612010231MLK00_B2_proj.tif $DIR/LT51680612010231MLK00_B3_proj.tif
 
gdalbuildvrt -overwrite -separate -te 37 -1.5 37.5 -1 $DIR/stack_UR.vrt $DIR/LT51680612010231MLK00_B1_proj.tif $DIR/LT51680612010231MLK00_B2_proj.tif $DIR/LT51680612010231MLK00_B3_proj.tif
gdalbuildvrt -overwrite -separate -te 37 -2 37.5 -1.5 $DIR/stack_LR.vrt $DIR/LT51680612010231MLK00_B1_proj.tif $DIR/LT51680612010231MLK00_B2_proj.tif $DIR/LT51680612010231MLK00_B3_proj.tif

sc02a Proces 4 tiles in one node using a cpu with the bash for loop

This is the easiest procedure to perform a geocomputation operation. Lunch a job that use a normal for loop to iterate on the 4 tiles. After the iterations (pkfilter) the for tiles can be re-merged by gdalbuildvrt and gdal_translate.

sbatch /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc02a_filter_tif_forloop.sh
sc02a_filter_tif_forloop.sh
#!/bin/bash
#SBATCH -p day
#SBATCH -J sc02a_filter_tif_forloop.sh
#SBATCH -n 1 -c 1 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /gpfs/scratch60/fas/sbsc/ga254/stdout/sc02a_filter_tif_forloop.sh.%J.out
#SBATCH -e /gpfs/scratch60/fas/sbsc/ga254/stderr/sc02a_filter_tif_forloop.sh.%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=4G
 
####### sbatch /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc02a_filter_tif_forloop.sh
 
module load GEOS/3.6.2-foss-2018a-Python-2.7.14 
module load GDAL/2.2.3-foss-2018a-Python-2.7.14
module load GSL/2.3-GCCcore-6.4.0
module load Boost/1.66.0-foss-2018a
module load PKTOOLS/2.6.7.6-foss-2018a 
module load Armadillo/8.400.0-foss-2018a
 
DIR=/gpfs/loomis/home.grace/$USER/geocomputation/Landsat
 
echo filter the stack_??.vrt files 
 
for file in $DIR/stack_??.vrt  ; do 
filename=$(basename $file .vrt)
pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/$filename.tif 
done 
 
echo  re-create the large tif 
 
gdalbuildvrt -overwrite $DIR/stack.vrt   $DIR/stack_UL.tif  $DIR/stack_LL.tif    $DIR/stack_UR.tif   $DIR/stack_LR.tif  
GDAL_CACHEMAX=3000
gdal_translate -co COMPRESS=DEFLATE -co ZLEVEL=9  $DIR/stack.vrt $DIR/stack_filter.tif 
rm $DIR/stack_UL.tif  $DIR/stack_LL.tif    $DIR/stack_UR.tif   $DIR/stack_LR.tif $DIR/stack.vrt

sc02b Multi-process inside one node using 4 cpu using xargs

This is one of the most efficient ways to perform a geocomputation operation. Lunch a job that use xargs to compute the iterations in a multicore (4 cpu in this case). After the iterations (pkfilter) the 4 tiles can be re-merged by gdalbuildvrt and gdal_translate. The use of xargs allows to constrains all the iterations in one node using different cpus. The advantage is that after xargs all the tiles will be ready to be merged back. A disadvantage can be that in case you are requesting many cpu (e.g. 24) you have to wait that one node will have 24 cpu free. A good compromise can be just requested 8-12 cpu and add more time to the wall time (-t)

sbatch /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc02b_filter_tif_xargs.sh
sc02b_filter_tif_xargs.sh
#!/bin/bash
#SBATCH -p day
#SBATCH -J sc02b_filter_tif_xargs.sh
#SBATCH -n 1 -c 4 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /gpfs/scratch60/fas/sbsc/ga254/stdout/sc02b_filter_tif_xargs.sh.%J.out
#SBATCH -e /gpfs/scratch60/fas/sbsc/ga254/stderr/sc02b_filter_tif_xargs.sh.%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=4G
 
####### sbatch /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc02b_filter_tif_xargs.sh
 
module load GEOS/3.6.2-foss-2018a-Python-2.7.14 
module load GDAL/2.2.3-foss-2018a-Python-2.7.14
module load GSL/2.3-GCCcore-6.4.0
module load Boost/1.66.0-foss-2018a
module load PKTOOLS/2.6.7.6-foss-2018a 
module load Armadillo/8.400.0-foss-2018a
 
export DIR=/gpfs/loomis/home.grace/$USER/geocomputation/Landsat
 
echo start the multicore computation
 
ls $DIR/stack_??.vrt | xargs -n 1 -P 4 bash -c $'  
file=$1
filename=$(basename $file .vrt)
pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/$filename.tif 
' _ 
 
echo  re-create the large tif 
 
gdalbuildvrt -overwrite $DIR/stack.vrt   $DIR/stack_UL.tif  $DIR/stack_LL.tif    $DIR/stack_UR.tif   $DIR/stack_LR.tif  
GDAL_CACHEMAX=10000
gdal_translate -co COMPRESS=DEFLATE -co ZLEVEL=9  $DIR/stack.vrt $DIR/stack_filter.tif 
rm $DIR/stack_UL.tif  $DIR/stack_LL.tif    $DIR/stack_UR.tif   $DIR/stack_LR.tif $DIR/stack.vrt

sc02c Proces 4 tiles with 4 indepent jobs - one node one cpu

This is a good way to run 4 independent jobs. Each job can perform one iteration. This option is good if need to lunch 100-200 jobs. You can also think that inside each job you can nest a xargs operation. The disadvantage is that each script will finish independently from the other so the only way to re-merge the tif is wait that all the jobs are finished.

for file in /gpfs/loomis/home.grace/$USER/geocomputation/Landsat/stack_??.vrt 
do sbatch --export=file=$file  /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc02c_filter_tif_njobs.sh 
done 
sc02c_filter_tif_njobs.sh
#!/bin/bash
#SBATCH -p day
#SBATCH -J sc02c_filter_tif_njobs.sh
#SBATCH -n 1 -c 1 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /gpfs/scratch60/fas/sbsc/ga254/stdout/sc02c_filter_tif_njobs.sh.%J.out
#SBATCH -e /gpfs/scratch60/fas/sbsc/ga254/stderr/sc02c_filter_tif_njobs.sh.%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=4G
 
####### for file in /gpfs/loomis/home.grace/$USER/geocomputation/Landsat/stack_??.vrt  ; do sbatch --export=file=$file  /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc02c_filter_tif_njobs.sh ; done 
 
module load GEOS/3.6.2-foss-2018a-Python-2.7.14 
module load GDAL/2.2.3-foss-2018a-Python-2.7.14
module load GSL/2.3-GCCcore-6.4.0
module load Boost/1.66.0-foss-2018a
module load PKTOOLS/2.6.7.6-foss-2018a 
module load Armadillo/8.400.0-foss-2018a
 
DIR=/gpfs/loomis/home.grace/$USER/geocomputation/Landsat
 
echo filter the $file file 
 
filename=$(basename $file .vrt)
pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/$filename.tif 
 
echo  re-create the large tif by another script

sc02c Proces 4 tiles with 1 job lunching 4-array-job - one node one cpu

This is a good way to run 4 independent jobs-array. Each job-array can perform one iteration. This option is good if you need to lunch many many computations (e.g. 1000-2000). You can also think that inside each job you can nest a xargs operation. The disadvantage is that each script will finish independently from the other so the only way to re-merge the tif is waiting that all the jobs are finished.

sbatch /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc02d_filter_tif_arrayjobs.sh
sc02d_filter_tif_arrayjobs.sh
#!/bin/bash
#SBATCH -p day
#SBATCH -J sc02d_filter_tif_arrayjobs.sh
#SBATCH -n 1 -c 1 -N 1
#SBATCH -t 1:00:00 
#SBATCH -o /gpfs/scratch60/fas/sbsc/ga254/stdout/sc02d_filter_tif_arrayjobs.sh.%A_%a.out 
#SBATCH -e /gpfs/scratch60/fas/sbsc/ga254/stderr/sc02d_filter_tif_arrayjobs.sh.%A_%a.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email
#SBATCH --mem-per-cpu=4G
#SBATCH --array=1-4
 
####### sbatch /gpfs/loomis/home.grace/$USER/geocomputation/scripts/sc02d_filter_tif_arrayjobs.sh
 
module load GEOS/3.6.2-foss-2018a-Python-2.7.14 
module load GDAL/2.2.3-foss-2018a-Python-2.7.14
module load GSL/2.3-GCCcore-6.4.0
module load Boost/1.66.0-foss-2018a
module load PKTOOLS/2.6.7.6-foss-2018a 
module load Armadillo/8.400.0-foss-2018a
 
DIR=/gpfs/loomis/home.grace/$USER/geocomputation/Landsat
 
file=$(ls $DIR/stack_??.vrt  | head  -n  $SLURM_ARRAY_TASK_ID | tail  -1 )
 
filename=$(basename $file .vrt)
pkfilter -of GTiff  -dx 3 -dy 3  -f mean -co COMPRESS=DEFLATE -co ZLEVEL=9 -i $file -o  $DIR/$filename.tif 
wiki/anunna_setting.txt · Last modified: 2020/07/17 06:39 (external edit)