• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Scanning IPs from selected log files

toprank

Member

Thanks: 1
Messages: 32

#1
How do I scan files for unique IP addresses where the IP may not be the first field in each line? Fortunately, httpd-access.log has IPs as the first field so awk '{if (!unique[$1]++) {print $1}}' /var/log/httpd-access.log works, but how do I do this for something like auth.log where the IPs are elsewhere?
 

SirDice

Administrator
Staff member
Administrator
Moderator

Thanks: 5,873
Best answers: 6
Messages: 26,461

#2
"Userland programming and scripting" is probably a better place for this, thread moved.

In general I use Perl for things like this, especially for the combination of log files and some clever regular expressions. But I'm quite used to Perl, I've used it for quite a number of years. Still, I think Perl is ideal for this type of situation, it's named Practical Extraction and Reporting Language for a reason, it really excels at doing tasks like this.
 

linux->bsd

Active Member

Thanks: 35
Messages: 109

#3
As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq. Probably best not to expand that to support IPv6.
 

fullauto2012

Active Member

Thanks: 27
Messages: 165

#4
Code:
cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u
will output only the uniq IPs in that file...
 

toprank

Member

Thanks: 1
Messages: 32

#5
As SirDice said, Perl is probably the way to go. But if you don't mind piping together a few command line programs, just do something like this: grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/auth.log | sort | uniq. Probably best not to expand that to support IPv6.
Thank you. This worked perfectly!

Code:
cat auth.log | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | uniq -u
will output only the uniq IPs in that file...
Thank you. This worked, too, but printed out duplicates.
 

Nicola Mingotti

Active Member

Thanks: 88
Messages: 184

#6
Thank you. This worked, too, but printed out duplicates.
That is beause of "uniq" command, which is not what one expects.
I never use it, its name is misleading.

I will add that, also Ruby is very good for such kind of things. If you try it you will love it. Expecially if you come from Perl.
 

Nicola Mingotti

Active Member

Thanks: 88
Messages: 184

#8
I never said "it does not work" . I said "uniq" has a misledading name, and I still belive that.
It is true it is not written "unique" but you read it like that; it is misleading.

It was like calling a command "maximum" but then, oh no, "maximum" works only if its input
its sorted. ... then, it should not be calld "maximum"

BTW, AFAIR (i studied this a long time ago so I may say bullshit now) if you need work on "n"
lines a simple "unique" operation would take O(n), a sort + unique takes O(n*log(n)) + O(n).
[misregarding space, for now]

I don't know the reason why "uniq" was implementaed like it is, maybe someone a bit older
knows the rationale. If it was my decision I would have make "uninq" do a real "unique"
operation and maybe "uniqu -a" should operate on adjacent lines.
 

Nicola Mingotti

Active Member

Thanks: 88
Messages: 184

#9
For example, (with Ruby)

Create a 10M lines file, each line is a random number
Code:
f = File.open("data.txt","w")
(1..1E7).each do |x|
  f.puts Random.rand(10000)
end
f.close
then create a true "unique" command called "unique.rd"
Code:
#!/usr/local/bin/ruby
diz = {}
while line=gets do
  if diz.has_key? line then nil else
    diz[line] = 1
    puts line
  end
end
Now compare "unique.rb" with sort + uniq
Code:
time cat data.txt | sort -n | uniq > data2.txt
real    1m31.001s
user    1m17.652s
sys     0m4.327s

time cat data.txt | ./unique.rb > data3.txt
real    0m8.550s
user    0m7.534s
sys     0m0.275s
 
Last edited:
Top