awk - Discrepancy in minimum value method -
linux novice here , first post. please forgive lack of clarity.
i've got think simple minimum value problem discrepancy between 2 different methods: awking file file , awking using wildcards.
i have 20,000 files (and growing) i'd find overall minimum value in second column. files have same prefix , in directories 1 level below executing script, using wildcards task quickly.
example:
awk 'min=="" || $2 < min {min=$2} end{print min}' */myfile.10*
it takes 14 seconds execute, isn't finding true minimum.
alternatively, stepped through each file of each directory , seem find correct minimum:
min=1000000000.0 dir in `ls -d *run*/`; minlocal=1000000000.0 file in `ls -1 ${dir}myfile.*`; val in `awk 'nr==1 {print $2}' $genfile`; compare_result=`echo $minlocal" > "$val | bc` if [ $compare_result -eq 1 ]; minlocal=$val fileminlocal=$file compare_result=`echo $min" > "$minlocal | bc` if [ $compare_result -eq 1 ]; min=$val filemin=$file fi fi done done compare=`echo $min" > "$minlocal | bc` if [ $compare -eq 1 ]; echo " error finding lowest chi^2 in " $fileminlocal echo " skipping..." else echo " lowest value (" $minlocal ")found in " $fileminlocal fi done
this approach finds overall minimum correctly, takes 4 minutes so. understand looping through each of these files take more time, why task fail using wildcards?
your awk script doing string instead of numeric comparison every min
value, since first statement in script explicitly string comparison. force numeric, change to:
awk 'min=="" || $2 < min+0 {min=$2} end{print min}' */myfile.10*
awk treats input type numeric-string, it's how use input first time allows awk figure out if it's number or string.
Comments
Post a Comment