Scanning IPs from selected log files

How do I scan files for unique IP addresses where the IP may not be the first field in each line? Fortunately, httpd-access.log has IPs as the first field so awk '{if (!unique[$1]++) {print $1}}' /var/log/httpd-access.log works, but how do I do this for something like auth.log where the IPs are elsewhere?
 
"Userland programming and scripting" is probably a better place for this, thread moved.

In general I use Perl for things like this, especially for the combination of log files and some clever regular expressions. But I'm quite used to Perl, I've used it for quite a number of years. Still, I think Perl is ideal for this type of situation, it's named Practical Extraction and Reporting Language for a reason, it really excels at doing tasks like this.
 
As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq. Probably best not to expand that to support IPv6.
 
Code:
cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u
will output only the uniq IPs in that file...
 
As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq. Probably best not to expand that to support IPv6.

Thank you. This worked perfectly!

Code:
cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u
will output only the uniq IPs in that file...

Thank you. This worked, too, but printed out duplicates.
 
Thank you. This worked, too, but printed out duplicates.

That is beause of "uniq" command, which is not what one expects.
I never use it, its name is misleading.

I will add that, also Ruby is very good for such kind of things. If you try it you will love it. Expecially if you come from Perl.
 
I never said "it does not work" . I said "uniq" has a misledading name, and I still belive that.
It is true it is not written "unique" but you read it like that; it is misleading.

It was like calling a command "maximum" but then, oh no, "maximum" works only if its input
its sorted. ... then, it should not be calld "maximum"

BTW, AFAIR (i studied this a long time ago so I may say bullshit now) if you need work on "n"
lines a simple "unique" operation would take O(n), a sort + unique takes O(n*log(n)) + O(n).
[misregarding space, for now]

I don't know the reason why "uniq" was implementaed like it is, maybe someone a bit older
knows the rationale. If it was my decision I would have make "uninq" do a real "unique"
operation and maybe "uniqu -a" should operate on adjacent lines.
 
For example, (with Ruby)

Create a 10M lines file, each line is a random number
Code:
f = File.open("data.txt","w")
(1..1E7).each do |x|
  f.puts Random.rand(10000)
end
f.close

then create a true "unique" command called "unique.rd"
Code:
#!/usr/local/bin/ruby
diz = {}
while line=gets do
  if diz.has_key? line then nil else
    diz[line] = 1
    puts line
  end
end

Now compare "unique.rb" with sort + uniq
Code:
time cat data.txt | sort -n | uniq > data2.txt
real    1m31.001s
user    1m17.652s
sys     0m4.327s

time cat data.txt | ./unique.rb > data3.txt
real    0m8.550s
user    0m7.534s
sys     0m0.275s
 
Last edited:
Back
Top