Making diff ignore specific text

In the process of using diff() to compare various directories I find that sometimes the only difference is due to some files having something related to $FreeBSD such as

Code:
# $FreeBSD: releng/12.2/usr.sbin/periodic/etc/daily/100.clean-disks 193302 2009-06-02 07:35:51Z brian $

otherwise all the files are identical.

Is there any way to get diff() to ignore such differences?
 
man diff(1)
Code:
     -I pattern --ignore-matching-lines pattern
             Ignores changes, insertions, and deletions whose lines match the
             extended regular expression pattern.  Multiple -I patterns may be
             specified.  All lines in the change must match some pattern for
             the change to be ignored.  See re_format(7) for more information
             on regular expression patterns.


If that doesn't suffice, just send the diff output through sed(1)
 
Or send the two diff inputs through sed.

If you just need a line-by-line diff, which doesn't know how to handle lines being added/removed, that's super easy to implement in scripting language du jour (python/perl/ruby/...), and then add little domain-specific preprocessing there.
 
Or send the two diff inputs through sed.

If you just need a line-by-line diff, which doesn't know how to handle lines being added/removed, that's super easy to implement in scripting language du jour (python/perl/ruby/...), and then add little domain-specific preprocessing there.

of course - pre-filtering the input is even more elegant ?

as for the scripting language: if it comes to plain and simple text mangling, awk is still by far the fastest and most straightforward solution (if sed isn't enough).
I once tried to re-implement a quickly hacked-together awk script which mangles csv files with ~500k lines and ~20 fields and does some conversions and calculations on the way. The (IMHO) clean and elegant Perl script did it in ~15 seconds, the awk script in ~5 seconds and after some small optimizations in 2-3 seconds on the same machine...
 
Thumbs up to awk.

For the application of diff'ing several files, awk would work, but it is not a "natural fit": Awk really shines when processing exactly one input file. For example:
Code:
BEGIN { total_inches = 0 }
{ total_inches += $1; print $1 * 2.54 "cm" }
END { print "Total is" total_inches "inches or" total_inches * 2.54 "cm" }'
to convert a list of measurements in inches to cm and print the total. If you have to process multiple files at once, awk gets a little squirrely.
 
Back
Top