User Tools

Site Tools


wiki:grappolo

GRAPPOLO a supercomputer in a superbox

Grappolo is a project designed to teach cluster computation.
We have prototyped and developed a portable micro cluster computer (replicating the functioning of the biggest cluster computer facility in the southwest UK ). This tool is similar to the Raspberry pi cluster developed at Southampton University but is aimed towards teaching Geographic Information Systems methods rather than raw computation, it is very low cost ( ~ £130), portable and a perfect replica of an operating system running on a true high performance cluster computer.

This is still an an ongoing project! Not finished yet. We are working on the OS/ software installation and we will update this page as soon as grappolo will work smoothly fine.

Grappolo in press check out the official raspberry pi magazine (issue 46 - 2016) documenting a showcase use of Grappolo.

Team

Project team and collaborators:

  • Stefano Casalegno, University of Exeter (Project leader)
  • Andrew Cowley, University of Exeter (brainstorming - software / OS installation)
  • Oliver Hatfield, Falmouth University (box design and construction)
  • Andy Smith, Falmouth University (software ideas support / cabling)
  • Victoria O'Brien, Spatial Ecology (software OS / installation)
  • Giuseppe Amatulli, Yale University (software applications for teaching)

Goals

  • Build a micro cluster compurer with the same operating system and software of a real HPC (High Performance Computer) for simulating big data processing.
  • Make it as small as possible so that grappolo can fligh in your hand lugage toghether with your laptop, book, toothbrush and newspaper.
  • Make it awsome so when you test supercomputing in a pub it will attract people's attention and they might want to learn more about it.
  • Make it low cost and open source so that is potentially replicable in schools or by any V2 open minded citizens willing to learn new amazing stuff.

List of material

We used 3 Raspberry pi's running a light distribution of Linux operating system. Below you can find codings for software installation and hardware assembling.
For the best box ever conceived in the history of boxes….
We used an awesome reddish acrylic Perspex and sawed an alluminium squared box section 15×15.
Shopping list:

TOTAL HARDWARE COSTS : 136.75 shipping and VAT included

beside hardware costs, to assemble and install grappolo require much time and effort. If you like us to assemble a grappolo for you, please contact us

Template operating system

Grappolo consist of 3 raspberry pi we will call them

  1. master (Grid schedler)
  2. node1 (slave or processing node1)
  3. node2 (slave or processing node2)

Each of these raspi share a template operating system which we will prepare in the following step. Successively, we will clone the template operating system and in a next step we will guide you through the customization of clones into different master and processing nodes.

Installing OS

  1. Check that your SD card is not trash
  2. Download Linux Debian derived operating system: Raspbian Jessie 4.1, released Nov. 2015
  df -h # find out for a partition that matches the roughly 1.something GB 
  # in our case is sdb2
  umount /dev/sdb2
  sudo parted /dev/sdb
  # enter in parted 
  $ sudo parted /dev/sdd
  (parted) unit chs
  (parted) print
  (parted) rm 2
  (parted) mkpart primary 8,40,32 866,80,9 # begin and end of the partition depend on your sd card size
  (parted) quit
  1. Plug Ethernet cable from raspi 1 to home router
  2. Insert SD in raspberry,
  3. Power raspi 1
  4. Open a terminal using a computer connected to the home router (via wireless or cable)

check your ip adress

  ifconfig | grep "inet addr"
        inet addr:127.0.0.1  Mask:255.0.0.0
        inet addr:192.168.1.68  Bcast:192.168.1.255  Mask:255.255.255.0

search for the ip adress of the raspberry pi

  sudo nmap -sP 192.168.1.1-255
  Starting Nmap 5.21 ( http://nmap.org ) at 2015-05-23 15:24 BST
     Nmap scan report for Unknown-28-32-c5-ae-4e-19.home (192.168.1.64)
     Host is up (0.0029s latency).
     MAC Address: 28:32:C5:AE:4E:19 (Unknown)
     Nmap scan report for ace.home (192.168.1.68)
     Host is up.
     Nmap scan report for Unknown-e8-06-88-9c-0f-66.home (192.168.1.70)
     Host is up (0.10s latency).
     MAC Address: E8:06:88:9C:0F:66 (Unknown)
     Nmap scan report for raspberrypi.home (192.168.1.80)
     Host is up (0.015s latency).
     MAC Address: B8:27:EB:65:6C:FF (Unknown)
     Nmap scan report for BThomehub.home (192.168.1.254)
     Host is up (0.013s latency).
     MAC Address: D0:84:B0:D7:AC:30 (Unknown)
     Nmap done: 255 IP addresses (5 hosts up) scanned in 5.44 seconds

You are now ready to connect to raspi1 via ssh, default login and pass are pi, raspberry.

  ssh pi@192.168.1.80
  yes

you are loged in the raspberry pi. Next steps are easy to do with the raspi-config GUI:

  sudo raspi-config
  1. Change password to masterpi
  2. Set time zone : internationalisation –> london
  3. Set the memory split and change the value to 16 (in advanced options)
  4. change hostname “ raspberry” to “grappolo” (in advanced options)
  sudo apt-get update
  sudo apt-get upgrade
  sudo reboot
  ssh pi@192.168.1.80
  

Add user admin (administartor) and remove user pi.
Password for user “admin” = “admin” …old pass for pi was masterpi

sudo useradd -m admin
sudo passwd admin #admin
sudo nano /etc/group
# Go through the file adding ,ad to the end of all of the groups that pi is in.  eg:   " adm:x:4:pi,admin  "
exit
ssh -X admin@192.168.1.80
chsh -s /bin/bash #default terminal is bash
sudo userdel pi

Set connections configurations

Connect the grappolo cluster to the outside world

We will generate a dynamic ip address to connect to the head node “grappolo” to the outside world. This will be be physically done ether via wifi dongle or via a usb-ethernet adapter depending on our working environment.

Connect nodes within the grappolo cluster

Within the cluster we create a static ip network using the Ethernet ports of each of the 3 raspi. The /etc/network/interface configuration file determine the above parameters. We need to insert the name of the wifi network and password wile connecting using the wifi dongle or we need to uncomment the two eth1 lines if we are using an ethernet connection from the cluster to the external world. The grappolo configuration using the wifi dongle looks like this:

pi@grappolo ~ $ cat /etc/network/interfaces
auto lo
iface lo inet loopback
 
auto eth0
iface eth0 inet static
address 10.141.255.254
netmask 255.255.0.0
network 10.141.0.0
broadcast 10.141.255.255
#gateway 10.141.255.254
 
# To connect via usb ethernet adaptor instead of wifi, uncomment the next two lines
#auto eth1
#iface eth1 inet dhcp
 
allow-hotplug wlan0
auto wlan0
 
iface wlan0 inet dhcp
#wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
       wpa-ssid "NETWORK_NAME"
       wpa-psk "NETWORK_PASSWORD"

Next we need to reboot and log in via the usb dongle or ethernet-usb:

     sudo reboot
     ssh -X pi@192.168.1.81
     

And the resulting connection configuration is the following

pi@grappolo ~ $ ifconfig
eth0      Link encap:Ethernet  HWaddr b8:27:eb:b6:4b:a5  
          inet addr:10.141.255.254  Bcast:10.141.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:69 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:9100 (8.8 KiB)  TX bytes:3231 (3.1 KiB)
 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:72 errors:0 dropped:0 overruns:0 frame:0
          TX packets:72 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:6288 (6.1 KiB)  TX bytes:6288 (6.1 KiB)
 
wlan0     Link encap:Ethernet  HWaddr 00:87:31:13:37:aa  
          inet addr:192.168.1.81  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:455 errors:0 dropped:0 overruns:0 frame:0
          TX packets:221 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:104200 (101.7 KiB)  TX bytes:29337 (28.6 KiB)

To read your configuration information (address; netmask; network; broadcast; gateway) you can type:

   pi@grappolo ~ $ ifconfig | grep -A1 eth0 | tail -1  
        inet addr:10.141.255.254  Bcast:10.141.255.255  Mask:255.255.0.0
   pi@grappolo ~ $ netstat -nr | tail -2 | awk '{if(NR==1)print "Gateway "$2 \\ 
                   else print "Destination "$1 }' 
             Gateway 0.0.0.0
             Destination 192.168.1.0

Gateway 192.168.1.254 Destination 192.168.1.0 We use the 10.141.255 internal static ip address to avoid any inconvenient ip-adress overlap between the dhcp extrnal and static internal cluster network configurations.

Install Software

Grappolo is specifically designed to process spatio-temporal data nonetheless could provide excellent teaching platform for cluster processing other data sources not geographically related. We start to install geographic information systems and geospatial libraries and tools:

Data processing software

PROG.4 - Gdal/Ogr - GRASS

sudo apt-get install grass grass-doc grass-dev

Openforis Geospatial toolkit

sudo apt-get install gcc g++ gdal-bin libgdal1-dev \\
  libgsl0-dev libgsl0ldbl libproj-dev python-gdal \\
  python-scipy python-tk python-qt4
# some of the above are already installed
wget foris.fao.org/static/geospatialtoolkit/releases/OpenForisToolkit.run
sudo chmod u+x OpenForisToolkit.run
sudo ./OpenForisToolkit.run

PKtools - Processing Kernel for geospatial data

sudo apt-get install pktools

AWK, Python and friends
https://www.python.orgwww.scipy.org/www.numpy.org/
AWK, Python, Numpy and Scipy are installed as default Raspian-Jessie OS and as dependencies form the previous libraries.

R language and environment for statistical computing

wget http://cran.rstudio.com/src/base/R-3/R-3.1.2.tar.gz
mkdir R_HOME
mv R-3.1.2.tar.gz R_HOME/
cd R_HOME/
tar zxvf R-3.1.2.tar.gz
cd R-3.1.2/
sudo apt-get install gfortran libreadline6-dev libx11-dev libxt-dev
./configure
make
sudo make install
# get a coup of tea and start watching Battleship Potemkin you have an hour to wait

Further we install R library for spatial data analysis and machine learning modelling using the CRAN Task View: Analysis of Spatial Data as suggested in http://cran.r-project.org/web/views/

   sudo su
   R
   install.packages(pkgs="ctv", dependencies=TRUE)
   # we selected Bristol UK as CRAN repository
   library("ctv")
   update.views("Spatial")
   update.views("MachineLearning")

Other useful tools

Emacs editor

… for people like Giuseppe or Pieter refusing to type on any other black squared thing with a cursor!

sudo apt-get install emacs

I like the bash locate command.

sudo apt-get install mlocate
sudo updatedb

Insatall screen to eventually launch several process in different teminals and leave them open on the background. Screen : Multiplex a physical terminal between several processes (typically interactive shells)

  sudo apt-get install screen
  

Setting Up an NFS Server

The default installation of Grid Engine assumes that the executables corresponding to the sge command are found on a $SGE_ROOT directory sitting on a shared filesystem accessible by all hosts in the cluster more info here.
For this purpose, we use a Network File System (NFS) which is a distributed file system allowing a user on a client computer to access files over a network. We are going to create and share the /usr/local/apps folder between master and slave nodes and be able to install the Grid engine executables and configuration files in the slave nodes. We first install the nfs server in the master:

sudo apt-get install nfs-kernel-server portmap nfs-common
sudo nano /etc/exports
# append  "/usr/local/apps 10.141.0.0/16(rw,sync)" to the end of /etc/exports file
 
sudo service rpcbind restart
sudo /etc/init.d/nfs-kernel-server restart
/etc/init.d/nfs-kernel-server status to check nfs service status
In previous versions of Debian (ex.: Weezy) it was applied the concept of Run levels now replaced by target units. If using Raspbian Weezy, the run level need to be switched from 2 to 3
runlevel # to check your default run level

Follow the following steps if you need to switch into appropriate Linux Run Level (e.g. A run level is a state of init and the whole system that defines what system services are operating.) Raspi by default is set to a state of run level 2 - Local Multiuser with Networking but without network service (like NFS) - and we want to switch to run level 3 - Full Multiuser with Networking. Once we are in level 3 we can use the two update-rc.d command below to allow automatic starting of the nfs servers at reboot.

sudo nano /etc/inittab
# to set default runlevel = 3
# change id:2:initdefault with: id:3.... it should look like this:
# The default runlevel.
# id:3:initdefault:
sudo update-rc.d nfs-kernel-server defaults
sudo update-rc.d rpcbind defaults
sudo apt-get install nfs-kernel-server portmap nfs-common
sudo nano /etc/exports # add the export as above
sudo service portmap start
sudo reboot
if NFS is still not running at reboot try the lines below
sudo apt-get purge rpcbind
sudo apt-get install nfs-kernel-server portmap nfs-common
sudo nano /etc/exports # edit the exports as above
sudo service portmap start # this didn't actually worked
sudo reboot

Later we will create a folder in the client server @node1 and mount in as shared with the correspondent folder in grappolo.

Clone OS

Shut down grappolo, remove the sd card and insert it in a linux machine. Copy sd image file to computer disk as back up and clone template.

   umount /dev/sdb1
   umount /dev/sdb2
   sudo dd bs=4M if=/dev/sdb of=jessie_and_softwares.img

This image is also usefull as back up OS template file. Now copy the clone template jessieandsoftwares.img image in two different SD cards. Insert successively the new micro sd cards into a laptop (in our case is mounted at /dev/sdb ; unmount the card and copy the “.img” using the dd command.

umount /dev/sdb1
umount /dev/sdb2
sudo dd bs=4M if=jessie_and_soft.img of=/dev/sdb

Customize grid engine master and computing nodes

Next we explain how to customize the cloned sd card for master (job scheduler) and slave (computation) nodes.

Users accounts and data storage

Next we explain how to add multiple students accounts and add usb drives to the master and computing nodes to read and write data.

Hands on cluster processing with grappolo

Here we explain how to use grappolo for submitting data processing jobs into a grid engine.

wiki/grappolo.txt · Last modified: 2016/06/22 11:00 (external edit)