# How can I do filtering between two matrix?

Owen 11/08/2018. 7 answers, 600 views

File1:

91  23  56  44  87  77
99  34  56  22  22  95
41  88  26  79  60  27
95  55  66  69  92  25

File2:

pass fail pass pass pass fail
pass fail pass fail fail pass
pass pass fail pass pass fail
pass pass fail pass pass fail

As I want to sum up the total fail marks for each row, here is the expected output.

output:

100
78
53
91

I would like to ask that how can I do the filtering on file1 based on the word "fail" in file2 in order to get the sum of fail marks.

RudiC 11/08/2018.

I don't think you need an END section:

awk '
NR == FNR       {for (i=1; i<=NF; i++) F[i,NR] = $i next } {T = 0 for (i=1; i<=NF; i++) T += ($i=="fail")?F[i,FNR]:0
print T
}
' file[12]
100
78
53
91

Thor 11/08/2018.

I would use a matrix language for such a task, e.g. GNU Octave.

Assuming you converted the pass/fail file into numerical values, e.g.:

sed 's/pass/1/g; s/fail/0/g' passfail > passfail.nums

You can now do the following:

marks    = dlmread('marks');

for i = 1:size(marks)(1)
sum(marks(i,:)(passfail(i,:) == 0))
end

Output:

ans =  100
ans =  78
ans =  53
ans =  91

Maxim 11/08/2018.

While I think using awk is good for portability, other languages seem easier to write and read for this task. GNU Octave was mentioned but does not come pre-installed on most machines. On the other hand, most systems have a version of python preinstalled. Here is a python version:

for marks, decisions in zip(open('file1').readlines(), open('file2').readlines()):
row_score = 0
for mark, decision in zip(marks.split(), decisions.split()):
if decision == 'fail':
row_score += int(mark)
print(row_score)

which returns the outputs you expected.

jimmij 11/08/2018.

Here is my awk approach:

awk 'NR==FNR{for(i=1;i<=NF;i++) a[NR"-"i]=$i; next} \ {for(j=1;j<=NF;j++) if($j=="fail") b[FNR]+=a[FNR"-"j]} \
END{for(k in b) print b[k]}' file1 file2

Awk doesn't support two-dimensional arrays, so we cooked ones by combining two numbers (row and field) in the same array index. The output is:

100
78
53
91

mosvy 11/08/2018.
awk '
BEGIN{ pf=ARGV[2]; ARGV[2]="" }
{ getline l <pf; split(l, a); n=0;
for(i=1;i<=NF;i++) if(a[i]=="fail") n+=$i; print n } ' file1 file2 100 78 53 91 Just like @Maxim's python version, but unlike all the other answers, this is processing the two files in parallel, line by line, instead of loading one of them whole into memory. Inian 11/08/2018. I guess using an Awk script would make this requirement a bit easy to solve. Do something like below. I guess its a bit slower than now posted jimmij's answer #!/usr/bin/awk -f FNR == NR { for(i=1;i<=NF;i++) if ($i == "fail")
idxArray[FNR] = (idxArray[FNR]) ? (idxArray[FNR]" "i):(i)
next
}{
delete Array
delete Line
i=""
j=""
sum=""
n=split(idxArray[FNR],Array," ")
l=split($0,Line," ") for (i=1;i<=n;i++) for (j=1;j<=l;j++) if (Array[i] == j ) sum += Line[j] print sum } and run the script as awk -f script.awk file2 file1 RudiC 11/09/2018. One-liner: paste file[12] | awk '{T=0; for (i=1; i<=NF/2; i++) T += ($(i+NF/2)=="fail")?\$i:0; print T}'
100
78
53
91