Scanning IPs from selected log files

toprank

Member

Thanks: 1
Messages: 32

#1
How do I scan files for unique IP addresses where the IP may not be the first field in each line? Fortunately, httpd-access.log has IPs as the first field so awk '{if (!unique[$1]++) {print $1}}' /var/log/httpd-access.log works, but how do I do this for something like auth.log where the IPs are elsewhere?
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 6,232
Messages: 27,223

#2
"Userland programming and scripting" is probably a better place for this, thread moved.

In general I use Perl for things like this, especially for the combination of log files and some clever regular expressions. But I'm quite used to Perl, I've used it for quite a number of years. Still, I think Perl is ideal for this type of situation, it's named Practical Extraction and Reporting Language for a reason, it really excels at doing tasks like this.
 

linux->bsd

Active Member

Thanks: 35
Messages: 110

#3
As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq. Probably best not to expand that to support IPv6.
 

fullauto2012

Active Member

Thanks: 27
Messages: 168

#4
Code:
cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u
will output only the uniq IPs in that file...
 
OP
OP
T

toprank

Member

Thanks: 1
Messages: 32

#5
As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq. Probably best not to expand that to support IPv6.
Thank you. This worked perfectly!

Code:
cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u
will output only the uniq IPs in that file...
Thank you. This worked, too, but printed out duplicates.
 

Nicola Mingotti

Well-Known Member

Thanks: 101
Messages: 250

#6
Thank you. This worked, too, but printed out duplicates.
That is beause of "uniq" command, which is not what one expects.
I never use it, its name is misleading.

I will add that, also Ruby is very good for such kind of things. If you try it you will love it. Expecially if you come from Perl.
 

Nicola Mingotti

Well-Known Member

Thanks: 101
Messages: 250

#8
I never said "it does not work" . I said "uniq" has a misledading name, and I still belive that.
It is true it is not written "unique" but you read it like that; it is misleading.

It was like calling a command "maximum" but then, oh no, "maximum" works only if its input
its sorted. ... then, it should not be calld "maximum"

BTW, AFAIR (i studied this a long time ago so I may say bullshit now) if you need work on "n"
lines a simple "unique" operation would take O(n), a sort + unique takes O(n*log(n)) + O(n).
[misregarding space, for now]

I don't know the reason why "uniq" was implementaed like it is, maybe someone a bit older
knows the rationale. If it was my decision I would have make "uninq" do a real "unique"
operation and maybe "uniqu -a" should operate on adjacent lines.
 

Nicola Mingotti

Well-Known Member

Thanks: 101
Messages: 250

#9
For example, (with Ruby)

Create a 10M lines file, each line is a random number
Code:
f = File.open("data.txt","w")
(1..1E7).each do |x|
  f.puts Random.rand(10000)
end
f.close
then create a true "unique" command called "unique.rd"
Code:
#!/usr/local/bin/ruby
diz = {}
while line=gets do
  if diz.has_key? line then nil else
    diz[line] = 1
    puts line
  end
end
Now compare "unique.rb" with sort + uniq
Code:
time cat data.txt | sort -n | uniq > data2.txt
real    1m31.001s
user    1m17.652s
sys     0m4.327s

time cat data.txt | ./unique.rb > data3.txt
real    0m8.550s
user    0m7.534s
sys     0m0.275s
 
Last edited:
Top