User Tools

Site Tools


wiki:awkadvance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

wiki:awkadvance [2017/12/05 22:53] (current)
Line 1: Line 1:
 +====== Advance AWK scripts ======
 +Exercise:\\
 +Script: open by kate [[http://​www.spatial-ecology.net/​ost4sem/​exercise/​basic_adv_awk/​adv_awk.sh | ~/​ost4sem/​exercise/​basic_adv_awk/​adv_awk.sh ]]\\
 +Data: [[http://​www.spatial-ecology.net/​ost4sem/​exercise/​basic_adv_awk/​input.txt | ~/​ost4sem/​exercise/​basic_adv_awk/​input.txt ]]\\
 +Directory: ~/​ost4sem/​exercise/​basic_adv_awk\\
 +\\
 +<code bash>
 +kate ~/​ost4sem/​exercise/​basic_adv_awk/​adv_awk.sh &
 +</​code>​
  
 +==== Mathematical operation ====
 +Extract the maximum value:
 +<code awk>
 +cd ~/​ost4sem/​exercise/​basic_adv_awk
 +awk '{ if (NR>1) {if ($4>max) max= $4 }} END {print max }' ​ input.txt
 +</​code>​
 +Sum the values in a column:
 +<code awk>
 +awk ' BEGIN { sum=0 } { sum=sum+$4} END {print sum }' ​ input.txt
 +</​code>​
 +Sum the values in a column under a specific condition:
 +<code awk>
 +awk '{ if($3==7) sum=sum+$5} END {print sum }' ​ input.txt
 +</​code>​
 +Sum the values in a column under two specific conditions:
 +<code awk>
 +awk '{ if($3==7 && $2==2005 ​ ) sum=sum+$5} END {print sum }' ​ input.txt
 +</​code>​
 +
 +==== A Bash/Awk tool for computing the average of columns based on  CLASS/ID column ====
 +
 +In this example we build a bash/awk tool able to calculate the average of columns based on  CLASS/ID column.
 +we use the **sort** bash function  ​
 +
 +<code bash>
 +more  ~/​ost4sem/​studycase/​fire_risk/​sh_script/​average.sh
 +</​code>​
 +The file has to be sorted base on the  CLASS/ID column!!!
 +Sort the file, run the script, and replay to the questions:
 +<code bash>
 +sort -k 2,2  input.txt > input_s.txt ​
 +sh ~/​ost4sem/​studycase/​fire_risk/​sh_script/​average.sh input_s.txt output.txt
 +</​code>​
 +The same file can be run in a macro script using the following commands:
 +<code bash>
 +sh ~/​ost4sem/​studycase/​fire_risk/​sh_script/​average.sh input_s.txt output.txt <<EOF
 +n
 +2
 +2
 +EOF
 +</​code>​
 +
 +Even more complicated is building up a script to calculate the weighted average. ​
 +<code bash>
 +more  ~/​ost4sem/​studycase/​fire_risk/​sh_script/​average_w.sh
 +</​code>​
 +
 +
 +
 +==== Query a dataset on the basis of the header name  ====
 +
 +We create an example table saved as test.csv:
 +
 +<code bash>
 +echo ID,​column_1,​colum_b,​colum_c,​TOTAL > test.csv
 +echo 1,3,5,6,14 >> test.csv
 +echo 2,​5,​4,​6,​15 ​ >> test.csv
 +echo 3,​8,​4,​6,​18 ​ >> test.csv
 +</​code>​
 +
 +<note tip> In this example we also deal with .csv file (comma separated values text file). For this reason we need to use the option -F, to define our input file as csv.</​note>​
 +
 +We create two subset files corresponding to the "​ID"​ and to the "​column_b"​ (first file) and "​ID"​ and "​TOTAL"​ (second file).
 +
 +<code bash>
 +for COLNAME in  colum_b TOTAL ; do
 +awk -F, -v COLNAME=$COLNAME ' { if (NR==1) ​ { for (col=1;​col<​=NF;​col++) { if ($col==COLNAME) {colprint=col}}} ​ else  {print $1 ","​ $colprint }}  ' test.csv > $COLNAME.csv ​
 +done
 +</​code>​
 +
 +==== Subset a limited number of column headers and query the dataset on the basis of the header subset selection ​ ====
 +
 +We have a dataset with 100 columns and 3 rows
 +
 +<code bash>
 +echo `seq -s \, 1 1 100`  > test.csv
 +echo `seq -s \, 2 20 2000`  >> test.csv
 +echo `seq -s \, 2 200 20000` ​ >> test.csv
 +awk -F,  ' { print NF }' test.csv
 +awk -F,  ' { print NR }' test.csv
 +</​code>​
 +
 +We would like to select the first 5,  the 25th and the 90th to the 94th colums from the total 100 colums (Fields in AWK) in our dataset.
 +We redirect the list of the selection in a "​list_of_columns.csv"​ file
 +
 +<code bash>
 +awk -F,  ' {  for(i=1;​i<​6;​ i++) if(NR==1) printf("​%s",​$i"​ " ) ; if(NR==1) ​ printf("​%s",​$25"​ " ) ; for(i=90;​i<​95;​ i++) if(NR==1) printf("​%s","​ "​$i ​ )   ​}'​ test.csv > list_of_columns.csv
 +</​code>​
 +
 +Finally we carry out a query of the original dataset on the basis of the columns previously selected. ​
 +
 +<code bash>
 +for COLNAME in  `cat list_of_columns.csv` ; do
 +echo $COLNAME
 +awk -F, -v COLNAME=$COLNAME ' { if (NR==1) ​ { for (col=1;​col<​=NF;​col++) { if ($col==COLNAME) {colprint=col}}} ​ else  {print $1 ","​ $colprint }} ' test.csv > OUTPUT_subset_$COLNAME.csv ​
 +done
 +</​code>​
 +
 +====== Merge multiple files in multiple columns of one file  ======
 +
 +    pr -m -t -s, File1.txt File2.txt File3.txt ​ |  awk '​BEGIN{print "​Column1,​Column2,​Column3"​} {print$0}'​ > multiple_coulumn_output_file.csv
 +<note important>​File 1 to 3 have only one coulmn</​note>​
wiki/awkadvance.txt ยท Last modified: 2017/12/05 22:53 (external edit)