User Tools

Site Tools


wiki:amazon_zargs1

Transform a simple "for loop" in multicore "for loop"

Multi-core processor are very common used in desk-top and lab-top pc. If a pc has 8 processors you should open 8 programs or terminal to use all of them. But this is not an efficient way for stand-alone processing routines.

Let's consider that you want run a gdalwarp command to several tif files (*.tif), to perform a re-projection.

For Loop

You may run the full process by applying the loop in the usual way:

OUTDIR=$HOME/outdir
mkdir $OUTDIR
INDIR=/data/ost4sem/exercise/basic_adv_gdalogr/fagus_sylvatica
 
time  ( 
for file in $INDIR/20?0_TSSP_IP-ENS-A2-SP20_43023435.tif ; do 
    filename=$(basename $file .tif)
    gdalwarp -r cubic -tr 0.008 0.008 -t_srs EPSG:4326 $file  $OUTDIR/proj_$filename.tif -overwrite
done 
)

In this way only one processor will be used. The others will be just sleeping or eventually swap when the process is applied to another tif.

For example, the following picture shows the processing time during the for loop. As you can see only one processor, the red one, is working.


xargs loop

The xargs command allows to run a process splitting the task to several processors numerically defined by the user:

export OUTDIR=$HOME/outdir
export INDIR=/data/ost4sem/exercise/basic_adv_gdalogr/fagus_sylvatica
time  ( 
ls $INDIR/20?0_TSSP_IP-ENS-A2-SP20_43023435.tif  | xargs -n 1 -P 3  bash -c $'
file=$1
filename=$(basename $file .tif)
gdalwarp -r cubic -tr 0.008 0.008 -t_srs EPSG:4326 $file  $OUTDIR/proj_$filename.tif -overwrite
' _ 
)

The -n option identifies the argument. The argument is the variable imported inside the loop.
The argument is identifies with $1 and for a better understanding renamed to file (file=$1).
The -P option identify the processors (in this case 2) used to run the full line inside -c $' …… ' _

This will process
- the first tif in processor number 1 (orange line)
- the second tif in processor number 2 (red line)
than
- the third tif in processor number 1 (orange line)

For example, the following picture shows the processing time during the xargs loop. As you can see two processors are running simultaneously, and the whole processing time is shorter.


The use of more arguments is also possible

time  ( 
for file in $INDIR/20?0_TSSP_??-ENS-A2-SP20_43023435.tif  ; do for res in 0.008 0.08 ; do echo $file $res ; done ; done | xargs -n 2 -P 2 bash -c $'
file=$1
res=$2
filename=$(basename $file .tif)
gdalwarp -r cubic -tr $res $res -t_srs EPSG:4326 $file $OUTDIR/RES$res"_"$file -overwrite
)
' _

This will process
- the first tif with res = 0.008 in processor number 1
- the first tif with res = 0.08 in processor number 2
and then

- the second tif with res = 0.008 in processor number 1
- the second tif with res = 0.08 in processor number 2
and so on….

wiki/amazon_zargs1.txt · Last modified: 2021/01/20 20:36 (external edit)