Apache, PHP, Perl, and STDIN not working as expected.

Hello all,

Sorry if this is in the wrong forum, as I am not sure if this is an Apache problem or a coding issue (which I am leaning towards).

I have written a PHP script that logs Apache access entries into a Postgres database on the fly. The PHP script reads STDIN, but occasionally won't log some entries - primarily Googlebot.

I am running FreeBSD 8.0-RELEASE-p2, Apache 2.2.14_5, and PHP 5.3.2

Here is an excerpt from my httpd.conf:
ErrorLog "/var/log/apache/httpd-error.log"
CustomLog "/var/log/apache/httpd-access.log" combined
CustomLog |/www/postlog/log.php combined
CustomLog |/www/postlog/log.pl combined


Below is the short the PHP script, log.php, which I am having problems with:
PHP:
#!/usr/local/bin/php
<?php
$stdin = fopen('php://stdin', 'r');
$dbg_log = fopen('/www/postlog/debug_log', 'a');

ob_implicit_flush(true);

while ($line = fgets($stdin)) {
            fwrite ($dbg_log, $line);
             /*
             regex and database importing
             */
}
?>

The perl script, log.pl that I wrote in order to help troubleshoot:
Code:
#!/usr/bin/perl

use strict;
$|=1; # Use unbuffered output
open (STDOUT, '>/www/postlog/perllog') or die $!;

while(<STDIN>) {
        print STDOUT $_;
}

What is odd is that the following two hits are not logged:
Code:
66.249.65.108 - - [15/May/2010:07:43:54 -0400] "GET /robots.txt HTTP/1.1" 200 70 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; 
+http://www.google.com/bot.html)"
66.249.65.108 - - [15/May/2010:07:43:54 -0400] "GET /user/login?destination=comment%2Freply%2F7%23comment-form HTTP/1.1" 404 208 "-" 
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

But the next two hits are logged:
Code:
66.249.65.119 - - [15/May/2010:08:46:08 -0400] "GET /robots.txt HTTP/1.1" 200 70 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; 
+http://www.google.com/bot.html)"
66.249.65.119 - - [15/May/2010:08:46:08 -0400] "GET /whois.php HTTP/1.1" 200 1202 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; 
+http://www.google.com/bot.html)"

While I don't believe to be my code (I could be wrong), I am wondering if I am overlooking something, or if there is a problem with Apache and STDIN.

Thanks,
James
 
Just a random thought, but normally Apache forks a number of times, so the four google requests could be served by two (or more) different processes, which may be the cause of your problem.

A a test, you could try running just one fork to serve requests:
Code:
  StartServers 1
  MinSpareServers 1
  MaxSpareServers 1
 
jedwards, in your scripts you opening files for rewrite(at least perl version).
And, we see in log these requests are at same time and may be they going simultaneously. So, your script ignoring possibility of many requests at same time.
 
Thanks for your replies. I went to reinstall Apache today, and I noticed this:

Code:
  APACHE_BUFFERED_LOGS=yes      Log entries are buffered before writing.
                                Writes may not be atomic, entries from multiple
                                children could become mixed together and your
                                web stats may be inaccurate.
                                [default is no]

At this point it definitely seems to be Apache. I just need to figure it out further from there.
 
Back
Top