### Sidebar

spatial-ecology.org

Trainings:

Learn:

Data:

Community:
teachers
students
projects
Matera 2015
Vancouver 2015
Santa Barbara 2015
Site design
Hands on:
Installations

Donations
USD

GBP

Exercise:

`kate ~/ost4sem/exercise/basic_adv_awk/adv_awk.sh &`

### Mathematical operation

Extract the maximum value:

```cd ~/ost4sem/exercise/basic_adv_awk
awk '{ if (NR>1) {if (\$4>max) max= \$4 }} END {print max }'  input.txt```

Sum the values in a column:

`awk ' BEGIN { sum=0 } { sum=sum+\$4} END {print sum }'  input.txt`

Sum the values in a column under a specific condition:

`awk '{ if(\$3==7) sum=sum+\$5} END {print sum }'  input.txt`

Sum the values in a column under two specific conditions:

`awk '{ if(\$3==7 && \$2==2005  ) sum=sum+\$5} END {print sum }'  input.txt`

### A Bash/Awk tool for computing the average of columns based on CLASS/ID column

In this example we build a bash/awk tool able to calculate the average of columns based on CLASS/ID column. we use the sort bash function

`more  ~/ost4sem/studycase/fire_risk/sh_script/average.sh`

The file has to be sorted base on the CLASS/ID column!!! Sort the file, run the script, and replay to the questions:

```sort -k 2,2  input.txt > input_s.txt
sh ~/ost4sem/studycase/fire_risk/sh_script/average.sh input_s.txt output.txt```

The same file can be run in a macro script using the following commands:

```sh ~/ost4sem/studycase/fire_risk/sh_script/average.sh input_s.txt output.txt <<EOF
n
2
2
EOF```

Even more complicated is building up a script to calculate the weighted average.

`more  ~/ost4sem/studycase/fire_risk/sh_script/average_w.sh`

### Query a dataset on the basis of the header name

We create an example table saved as test.csv:

```echo ID,column_1,colum_b,colum_c,TOTAL > test.csv
echo 1,3,5,6,14 >> test.csv
echo 2,5,4,6,15  >> test.csv
echo 3,8,4,6,18  >> test.csv```
In this example we also deal with .csv file (comma separated values text file). For this reason we need to use the option -F, to define our input file as csv.

We create two subset files corresponding to the “ID” and to the “column_b” (first file) and “ID” and “TOTAL” (second file).

```for COLNAME in  colum_b TOTAL ; do
awk -F, -v COLNAME=\$COLNAME ' { if (NR==1)  { for (col=1;col<=NF;col++) { if (\$col==COLNAME) {colprint=col}}}  else  {print \$1 "," \$colprint }}  ' test.csv > \$COLNAME.csv
done```

### Subset a limited number of column headers and query the dataset on the basis of the header subset selection

We have a dataset with 100 columns and 3 rows

```echo `seq -s \, 1 1 100`  > test.csv
echo `seq -s \, 2 20 2000`  >> test.csv
echo `seq -s \, 2 200 20000`  >> test.csv
awk -F,  ' { print NF }' test.csv
awk -F,  ' { print NR }' test.csv```

We would like to select the first 5, the 25th and the 90th to the 94th colums from the total 100 colums (Fields in AWK) in our dataset. We redirect the list of the selection in a “listofcolumns.csv” file

`awk -F,  ' {  for(i=1;i<6; i++) if(NR==1) printf("%s",\$i" " ) ; if(NR==1)  printf("%s",\$25" " ) ; for(i=90;i<95; i++) if(NR==1) printf("%s"," "\$i  )   }' test.csv > list_of_columns.csv`

Finally we carry out a query of the original dataset on the basis of the columns previously selected.

```for COLNAME in  `cat list_of_columns.csv` ; do
echo \$COLNAME
awk -F, -v COLNAME=\$COLNAME ' { if (NR==1)  { for (col=1;col<=NF;col++) { if (\$col==COLNAME) {colprint=col}}}  else  {print \$1 "," \$colprint }} ' test.csv > OUTPUT_subset_\$COLNAME.csv
done```

# Merge multiple files in multiple columns of one file

`pr -m -t -s, File1.txt File2.txt File3.txt  |  awk 'BEGIN{print "Column1,Column2,Column3"} {print\$0}' > multiple_coulumn_output_file.csv`
File 1 to 3 have only one coulmn