Consider the following data
Naive use of grep gives a lot of false positives.
lines 7,8,10 has the word green in the 3rd field. But they should not be printed since green does not appear at the end.
line9 - the letters green are present in the third field but is not preceded by a space, so should not be printed.
line 11 is most likely a data error. The third field has the word green but is not associated with any fruit, vegetable, plant etc.,
Further, to print all the spherical objects, one can use
How the solution works:-
/ ... / delimiters of the regular expression
\s test for space
$ test for end of field
Tested on Debian Wheezy using
$ cat data.txtThe problem is to filter the lines where the third field ends in the word green. So the output should be
1,fruit,apple red,spherical
2,fruit,apple green,spherical
3,vegetable,peppers green,irregular
4,vegetable,peppers yellow,irregular
5,vegetable,peppers red,irregular
6,vegetable,broccoli,irregular and green
7,plant,green spinach,leaves
8,plant,very green spinach,leaves
9,plant,verygreenspinach,leaves
10,seed,green pea,spherical
11,unknown,green,undefined
2,fruit,apple green,sphericalShort answer:- use awk with regular expression support
3,vegetable,peppers green,irregular
$ awk -F"," '{if ($3 ~ /\sgreen$/) print $0}' data.txtLong answer:-
2,fruit,apple green,spherical
3,vegetable,peppers green,irregular
Naive use of grep gives a lot of false positives.
$ grep green data.txtline6 should not be printed as the word green appears in the 4th column (and not the 3rd).
2,fruit,apple green,spherical
3,vegetable,peppers green,irregular
6,vegetable,broccoli,irregular and green
7,plant,green spinach,leaves
8,plant,very green spinach,leaves
9,plant,verygreenspinach,leaves
10,seed,green pea,spherical
11,unknown,green,undefined
lines 7,8,10 has the word green in the 3rd field. But they should not be printed since green does not appear at the end.
line9 - the letters green are present in the third field but is not preceded by a space, so should not be printed.
line 11 is most likely a data error. The third field has the word green but is not associated with any fruit, vegetable, plant etc.,
Further, to print all the spherical objects, one can use
$ awk -F"," '{if ($4=="spherical") print $0}' data.txtHere a full match on the 4th field is performed. However, this trick cannot be extended to the present problem as only partially matches on the third field are desired.
1,fruit,apple red,spherical
2,fruit,apple green,spherical
10,seed,green pea,spherical
How the solution works:-
$ awk -F"," '{if ($3 ~ /\sgreen$/) print $0}' data.txt~ tests for a match
/ ... / delimiters of the regular expression
\s test for space
$ test for end of field
Tested on Debian Wheezy using
$ awk --version
GNU Awk 4.0.1
Copyright (C) 1989, 1991-2012 Free Software Foundation.
No comments:
Post a Comment