find replace question -- kind of

Weird question. I have a file with several thousand lines in it. Each line is a path to a file. Is there a simple way to remove all lines that don't include "/a/b/c/" in them.

After I do this I want to compare a.txt against b.txt and output only the lines that are not in both files.
 
Not that weird. If you need to edit in place you can use ed() or ex(). If you want to script it use sed(). Even using vi() can aid if a visual editor is needed.

Read the respective man pages to find out how to invert your search with regular expression.

To compare diff() will help you here.
 
Hi,

I suggest you abandon that screen editor, and apply yourself to ed(1) for every editing session during the next month.

Your slow initial progress will be rewarded. You may even join the club for what Ritchie and Thompson described as "salvation through suffering".

When your stint is finished, you'll be an expert in ed(1), sed(1), and regular expressions, not to mention the ":" operator in vi(1). You'll also discover that the arrow keys are for monkeys.

Failing that:

Code:
sed -e 's;^/a/b/c;' <file1 >file1a
The caret ("^") anchors to the left, so leave it out if the "/a/b/c/" is not at the start of the line.

Cheers,
 
I'm not entirely sure, but
Code:
cat a.txt a.txt b.txt | sort | uniq -u # if -u is correct, sort correct ...
is a trick I stumbled upon a while back. Unsure if it is an answer to the latter part of the first post in this question, no time to re-test. But I used it extensively... to maybe show lines in b.txt that exist but not in a.txt.
 
kclark said:
Weird question. I have a file with several thousand lines in it. Each line is a path to a file. Is there a simple way to remove all lines that don't include "/a/b/c/" in them.

No need to muck around with ed, sed, etc.

Code:
grep '/a/b/c' originalfile > otherfile
 
jalla said:
No need to muck around with ed, sed, etc.

Code:
grep '/a/b/c' originalfile > otherfile

Should be:

Code:
grep -v '/a/b/c' originalfile > otherfile

to get the lines that do not contains /a/b/c as asked in the original post.

Anyway, this is of course possible with pretty much any text editor available on Unix (Emacs for instance), but the command line is usually the right and most automated way of doing such text manipulation.
For more complex text manipulation Perl can come in hand.
 
fluca1978 said:
Should be:

Code:
grep -v '/a/b/c' originalfile > otherfile

to get the lines that do not contains /a/b/c as asked in the original post.

Anyway, this is of course possible with pretty much any text editor available on Unix (Emacs for instance), but the command line is usually the right and most automated way of doing such text manipulation.
For more complex text manipulation Perl can come in hand.

You may want to read the original post again;)
 
Use comm(1) to show lines in one file but not the other. This command will output two tab-separated columns: lines only in a.txt and lines only in b.txt. Column 3 contains lines in both files and is suppressed by the -3 option. I have a hard time remembering it's subtractive, not additive. The only requirement is that both files need to be sorted.
[cmd=""]comm -3 a.txt b.txt[/cmd]

shells/bash's process substitution can be used to combine grep with comm in a single command. The first Google hit for "bash process substitution" contains an example using comm: http://tldp.org/LDP/abs/html/process-sub.html. Remember, both inputs need to be sorted.
[cmd=""]comm -3 <(grep '/a/b/c' a.txt | sort) <(sort b.txt)[/cmd]

Since comm uses a tab to separate the two columns representing each file, sed can be used to remove the leading and trailing tab output for file b.txt and a.txt respectively. Use ctrl-v <tab> to enter a literal tab in the command line. The space in the first command is actually a tab, shown as TAB in the second command.
[cmd=""]comm -3 <(grep '/a/b/c' a.txt | sort) <(sort b.txt) | sed -e 's/^ //g; s/ $//g'[/cmd]
[cmd=""]comm -3 <(grep '/a/b/c' a.txt | sort) <(sort b.txt) | sed -e 's/^TAB//g; s/TAB$//g'[/cmd]

Cheers!
 
PugTsurani said:
Use comm(1) to show lines in one file but not the other. This command will output two tab-separated columns: lines only in a.txt and lines only in b.txt. Column 3 contains lines in both files and is suppressed by the -3 option. I have a hard time remembering it's subtractive, not additive. The only requirement is that both files need to be sorted.
[cmd=""]comm -3 a.txt b.txt[/cmd]

shells/bash's process substitution can be used to combine grep with comm in a single command. The first Google hit for "bash process substitution" contains an example using comm: http://tldp.org/LDP/abs/html/process-sub.html. Remember, both inputs need to be sorted.
[cmd=""]comm -3 <(grep '/a/b/c' a.txt | sort) <(sort b.txt)[/cmd]

Since comm uses a tab to separate the two columns representing each file, sed can be used to remove the leading and trailing tab output for file b.txt and a.txt respectively. Use ctrl-v <tab> to enter a literal tab in the command line. The space in the first command is actually a tab, shown as TAB in the second command.
[cmd=""]comm -3 <(grep '/a/b/c' a.txt | sort) <(sort b.txt) | sed -e 's/^ //g; s/ $//g'[/cmd]
[cmd=""]comm -3 <(grep '/a/b/c' a.txt | sort) <(sort b.txt) | sed -e 's/^TAB//g; s/TAB$//g'[/cmd]

Cheers!

The OP had PMed me and explained he was creating a port for his work. In this case he wouldn't have access to using bashisms. Though this is actually not explained in the post( which should have been ). It's very nice of you to sign up here to help a fellow user. Welcome to the FreeBSD forums!

There is a compare and contrast to your suggestion above at this link:

http://mywiki.wooledge.org/ProcessSubstitution

The example used is this syntax in bash:
Code:
diff <(sort list1) <(sort list2)
would be this in sh:
Code:
mkfifo /var/tmp/fifo1
mkfifo /var/tmp/fifo2
sort list1 >/var/tmp/fifo1 &
sort list2 >/var/tmp/fifo2 &
diff /var/tmp/fifo1 /var/tmp/fifo2
rm /var/tmp/fifo1 /var/tmp/fifo2

Though the second version is more verbose it is portable with all bourne derived shells:

Process substitution is definitely not portable. You may use NamedPipes to accomplish the same things.
 
Back
Top