Scanning IPs from selected log files

toprank · Feb 2, 2018

How do I scan files for unique IP addresses where the IP may not be the first field in each line? Fortunately, httpd-access.log has IPs as the first field so awk '{if (!unique[$1]++) {print $1}}' /var/log/httpd-access.log works, but how do I do this for something like auth.log where the IPs are elsewhere?

SirDice · Feb 2, 2018

"Userland programming and scripting" is probably a better place for this, thread moved.

In general I use Perl for things like this, especially for the combination of log files and some clever regular expressions. But I'm quite used to Perl, I've used it for quite a number of years. Still, I think Perl is ideal for this type of situation, it's named Practical Extraction and Reporting Language for a reason, it really excels at doing tasks like this.

linux->bsd · Feb 3, 2018

As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq. Probably best not to expand that to support IPv6.

fullauto2012 · Feb 3, 2018

Code:

cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u

will output only the uniq IPs in that file...

toprank · Feb 3, 2018

linux->bsd said:
As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq. Probably best not to expand that to support IPv6.

Thank you. This worked perfectly!

fullauto2012 said:
Code:

cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u

will output only the uniq IPs in that file...

Thank you. This worked, too, but printed out duplicates.

Nicola Mingotti · Feb 9, 2018

toprank said:
Thank you. This worked, too, but printed out duplicates.

That is beause of "uniq" command, which is not what one expects.
I never use it, its name is misleading.

I will add that, also Ruby is very good for such kind of things. If you try it you will love it. Expecially if you come from Perl.

tingo · Feb 11, 2018

uniq(1) works, but you have to sort(1) to the lines in the input files first. The unix way, you know.

Nicola Mingotti · Feb 11, 2018

I never said "it does not work" . I said "uniq" has a misledading name, and I still belive that.
It is true it is not written "unique" but you read it like that; it is misleading.

It was like calling a command "maximum" but then, oh no, "maximum" works only if its input
its sorted. ... then, it should not be calld "maximum"

BTW, AFAIR (i studied this a long time ago so I may say bullshit now) if you need work on "n"
lines a simple "unique" operation would take O(n), a sort + unique takes O(n*log(n)) + O(n).
[misregarding space, for now]

I don't know the reason why "uniq" was implementaed like it is, maybe someone a bit older
knows the rationale. If it was my decision I would have make "uninq" do a real "unique"
operation and maybe "uniqu -a" should operate on adjacent lines.

Nicola Mingotti · Feb 11, 2018

For example, (with Ruby)

Create a 10M lines file, each line is a random number

Code:

f = File.open("data.txt","w")
(1..1E7).each do |x|
  f.puts Random.rand(10000)
end
f.close

then create a true "unique" command called "unique.rd"

Code:

#!/usr/local/bin/ruby
diz = {}
while line=gets do
  if diz.has_key? line then nil else
    diz[line] = 1
    puts line
  end
end

Now compare "unique.rb" with sort + uniq

Code:

time cat data.txt | sort -n | uniq > data2.txt
real    1m31.001s
user    1m17.652s
sys     0m4.327s

time cat data.txt | ./unique.rb > data3.txt
real    0m8.550s
user    0m7.534s
sys     0m0.275s

Scanning IPs from selected log files

Administrator