{ "cells": [ { "cell_type": "markdown", "id": "317cdc9a", "metadata": {}, "source": [ "# Scaling-up: batch processing on the cluster with pyjeo" ] }, { "cell_type": "markdown", "id": "6b065cf8-358f-4925-a302-0798a3540db6", "metadata": {}, "source": [ "## Summary of computing concepts" ] }, { "cell_type": "markdown", "id": "8411b623-262f-4069-a3af-0cac1b39ba10", "metadata": {}, "source": [ "### High Performance (HPC) and High Throughput Computing (HTC)\n", "\n", "- High Performance (HPC): Tightly-coupled, parallel applications requiring dedicated software. Need for low-latency networks that are designed for passing short messages very quickly between compute nodes (Message Passing Interface). \n", "- High Throughput Computing (HTC): large number of loosely-coupled tasks (also called an embarrassingly parallel workload)." ] }, { "cell_type": "markdown", "id": "e12f1f6f-ed3f-4380-9bdd-6870540a0ce9", "metadata": {}, "source": [ "### Parallel processing\n", "\n", "- Embarrassingly parallel processing with tiling: Exercise 1\n", "- multi-core processing with openMP (multi-threading): Exercise 2" ] }, { "cell_type": "markdown", "id": "f1242454-f541-4e46-9a37-4b241d1b2eb1", "metadata": {}, "source": [ "## Embarrassingly parallel processing with tiling" ] }, { "cell_type": "markdown", "id": "f43edaec", "metadata": {}, "source": [ "## Using pyjeo docker image in Surf with EasyBuild" ] }, { "cell_type": "markdown", "id": "ccd13296", "metadata": {}, "source": [ "Step 1: load modules\n", "```\n", "module load Python/3.9.6-GCCcore-11.2.0\n", "module load LibTIFF/4.3.0-GCCcore-11.2.0\n", "module load libgeotiff/1.7.0-GCCcore-11.2.0\n", "module load uthash/2.3.0-foss-2021b\n", "module load shapelib/1.6.0-foss-2021b\n", "module load GSL\n", "module load GDAL\n", "module load jsoncpp/1.9.5-foss-2021b\n", "module load fann/2.2.0-foss-2021b\n", "module load SWIG/4.2.1-foss-2021b\n", "export PYTHONPATH=\"\"\n", "```" ] }, { "cell_type": "markdown", "id": "b214f7ac-007f-40e0-a54e-953c1191b615", "metadata": {}, "source": [ "Step 2: pip install wheels\n", "```\n", "python -m venv pyjeo-venv\n", "source $HOME/pyjeo-venv/bin/activate\n", "pip install numpy==1.26.4 --force-reinstall\n", "pip install /project/geocourse/Software/wheels/jiplib-1.1.3-py3-none-any.whl\n", "pip install /project/geocourse/Software/wheels/pyjeo-1.1.8-py3-none-any.whl\n", "```" ] }, { "cell_type": "markdown", "id": "8fafff11-d059-4618-8070-0ab79a5b8a45", "metadata": {}, "source": [ "Copy These 4 files to your local directory $HOME/scripts\n", "\n", "```\n", "cp /project/geocourse/Software/scripts/pyjeo_calculate_ndvi.sh $HOME/scripts/pyjeo_calculate_ndvi.sh\n", "cp /project/geocourse/Software/scripts/pyjeo_calculate_ndvi.py $HOME/scripts/pyjeo_calculate_ndvi.py\n", "cp /project/geocourse/Software/scripts/pyjeo_extract_parcels.sh $HOME/scripts/pyjeo_extract_parcels.sh\n", "cp /project/geocourse/Software/scripts/pyjeo_extract_parcels.py $HOME/scripts/pyjeo_extract_parcels.py\n", "```\n", "\n", "```\n", "wget https://raw.githubusercontent.com/selvaje/SE_data/master/exercise/PKTOOLS/pyjeo_calculate_ndvi.sh\n", "wget https://raw.githubusercontent.com/selvaje/SE_data/master/exercise/PKTOOLS/pyjeo_calculate_ndvi.py\n", "wget https://raw.githubusercontent.com/selvaje/SE_data/master/exercise/PKTOOLS/pyjeo_extract_parcels.sh\n", "wget https://raw.githubusercontent.com/selvaje/SE_data/master/exercise/PKTOOLS/pyjeo_extract_parcels.py\n", "```\n", "\n", "Replace `geocourse-teacher03` with your user name" ] }, { "cell_type": "markdown", "id": "91c495e1", "metadata": {}, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "id": "1aaa2165", "metadata": { "tags": [] }, "source": [ "### Exercise 1: Create NDVI from Sentinel-2 composite\n", "\n", "- Two spectral bands: red (B04), near infrared (B08)\n", "- Spatial region: Flanders (Belgium)\n", "- Acquisition time: July - August 2021 maximum NDVI Composite " ] }, { "cell_type": "markdown", "id": "85900a46", "metadata": {}, "source": [ "#### Tiling mechanism in pyjeo\n", "\n", "```python\n", "jim = pj.Jim('/path/to/large_image.vrt', tileindex = x, tiletotal = 64)\n", "```\n", "Will automatically cut the large image into tiles and load the xth tile" ] }, { "cell_type": "markdown", "id": "0114b915", "metadata": {}, "source": [ "You should tell the scheduler to run your script for each `tileindex` from `0` to `tileindex-1`" ] }, { "cell_type": "markdown", "id": "5fafed4e", "metadata": {}, "source": [ "Run the script as:\n", "\n", "`sbatch pyjeo_calculate_ndvi.sh`" ] }, { "cell_type": "markdown", "id": "df5e259e", "metadata": {}, "source": [ "#### Tips\n", "\n", "- write your script with command line arguments (argparse)\n", "- progam verbose mode to see what is going on\n", "- write to /scratch and move to your destination\n", "- clean up temporary files (if not automatic)\n", "- tile when possible (using the tiling mechanism in pyjeo)" ] }, { "cell_type": "markdown", "id": "d7a5deb7-368e-4d91-8470-b00e210fee1d", "metadata": { "tags": [] }, "source": [ "### Exercise 2: Extract polygons from raster image\n", "\n", "- 35317 polygons with parcel boundaries\n", "- Spatial region: Flanders (Belgium)\n", "- Calculate zonal statistics for each parcel (`[\"mean\", \"stdev\", \"sum\"]`)\n", "- Using multi-threading mechanism implemented in pyjeo" ] }, { "cell_type": "markdown", "id": "392c3ecf-68a8-44c4-8783-f2a74ff5bcca", "metadata": {}, "source": [ "Adapt `-c 1` for (1, 2, 3, 4, 5, 6, 7, 8)\n", "```bash\n", "#SBATCH -N 1 -c 1 -n 1\n", "```" ] }, { "cell_type": "markdown", "id": "67c42d95-eb64-4fa9-8cab-854072137590", "metadata": {}, "source": [ "Observe Amdahls law:\n", "Amdahl's law will always be a limiting factor for speedup:\n", "`1/((1-P) + P/N)`\n", "\n", "Where:\n", "- N No of cores in the gpu.\n", "- P: Parallelizable portion of code’s execution time of the program.\n", "- 1-P: serial code in the program." ] }, { "cell_type": "code", "execution_count": null, "id": "6bcedfaa-265a-49a5-bf5a-88a15065c4aa", "metadata": { "tags": [] }, "outputs": [], "source": [ "from matplotlib import pyplot as plt \n", "import numpy as np\n", "\n", "N = np.arange(1,8)\n", "ys = [1/((1-P/10) + P/10.0/N) for P in range(1, 9)]\n", "\n", "for i, y in enumerate(ys):\n", " plt.plot(N, y, label=str(i))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "93fec6f1-fa84-4ffe-953b-9e0b2de286fa", "metadata": {}, "source": [ "Estimate the speedup and P for the extract function" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }