Inconsistent Apache Error

I have a script running which builds a web page, which is then displayed on a plasma TV controlled by a PC. This runs continually throughout the day, waiting 30 seconds after it gets done generating the page to do it again. The page itself is set to reload every 15 seconds (via hypertext <meta http-equiv='refresh' content='15'> tag) to pick up the changes shortly after they happen. In order to prevent problems like the page reloading in the browser while the page is being built by the script (which does take a few seconds to parse/format the data), I build the page to a temp file then move it to the /usr/local/apache22/data directory, overwriting the file that was already there. It's worked perfectly so far with one exception.

After several days of running, there was a 403 forbidden error displayed on the plasma. I tapped F5 on the PC to reload the page and all was then well. Figuring this may be the 1 in a million long-shot that the PC was reloading the page (local network, so we're talking next to no load time) in that same split instant the mv overwrote the file, I decided to monitor only. This was early last week, but earlier today the same thing happened and I no longer believe that it's coincidence. Irregardless of what the root cause is, I need to find a solution.

The apache error logs show:

Code:
[Wed Nov 07 08:36:59 2012] [error] [client xxx.xxx.xxx.xxx] (13)Permission denied: file
permissions deny server access: /usr/local/www/apache22/data/PlasmaFile.html

Since it's a script generating the hypertext file, the permissions should always be the same. I do nothing with the permissions of the generated file, which are -rw-r--r-- and root:wheel for ownership.

Other than adding a few lines to my script to change permissions/ownership of the file before it's moved over (already done and monitoring), I'm at a loss. The main problem from a troubleshooting standpoint that this build/reload process happens hundreds or even thousands of times before the problem occurs, so I'm not going to know if something I do fixes the problem for several days. This inconsistency is what led me to think that the problem was a timing issue of the script overwriting the publicly viewable file while the page was being loaded, but this is a very long shot at best.

Any ideas of how to fix it, what's causing the problem, or even how to figure out what's happening?
 
Even if it is quite unlikely, in rare cases (perhaps once in a thousand of times), it may happen that Apache tries to read the file in the course of replacing the old one by moving the new one on top of it. Apache fails (SC 403), because during the move, the file is blocked for access by other processes. Once this move operation has been completed, the file can be read without error until the server tries again to read it within the time frame of the move.

Note also, that in your case mv involves a rm before, because the old file must be taken away in order to put the new one into place, this takes more time than a simple move to an unlinked location. So, for a very short period of time, the file even does not exist and Apache would throw SC 404.

Furthermore, if your temp file resides on a separate file system, perhaps /tmp, then the simple mv is actually rm htmlfile; cp tmpfile htmlfile; rm tmpfile. So, you want to make sure, that the tempfile resides on the same file system as PlasmaFile.html.

In neither case mv can be considered being an atomic operation, which means that you have to synchronize Apache and your script, in order to avoid both accessing the same resource at the same time. I suggest, you write a CGI or a PHP script which checks and perhaps waits until the resource is ready for reading, then read it in and serve it.

Best regards

Rolf
 
Did exactly as you suggested - created a CGI script that reads & spits out the HTML file but which checks for a .lock file before reading. I then modified the data parsing script to touch the .lock file before the mv and rm the .lock immediately afterward. Only problem that *could* pop up now is if the mv fires during the time the CGI script is reading the file.

Still find it very hard to believe that operations which each take a fraction of a second are stepping on each other, but that's what the evidence points to. Guess I'll know in a few days or weeks if it's fixed.
 
Ruler2112 said:
Did exactly as you suggested - created a CGI script that reads & spits out the HTML file but which checks for a .lock file before reading. I then modified the data parsing script to touch the .lock file before the mv and rm the .lock immediately afterward. Only problem that *could* pop up now is if the mv fires during the time the CGI script is reading the file.

Still find it very hard to believe that operations which each take a fraction of a second are stepping on each other, but that's what the evidence points to. Guess I'll know in a few days or weeks if it's fixed.

You would need a second lock file that is touched by the CGI script and tested by the file generation script before issuing mv.

However, I am almost sure that you could resolve everything by simple while-loops in both, the CGI- and the gen.-script. The following C excerpts are meant to show the concept, and it should be possible to do something similar in almost any other language.

CGI:

Code:
...
time_t timeout = time(NULL) + 3;
FILE  *pfhtml;
while ((pfhtml = fopen("PlasmaFile.html", "r")) == NULL && time(NULL) <= timeout);

if (pfhtml)
   // read-in and send the file
else
   // send an error message
...


HTML-Generator:

Code:
...
time_t timeout = time(NULL) + 3;
while (rename("path/to/tmpFile", "path/to/PlasmaFile.html") == -1 && time(NULL) <= timeout);

if (time(NULL) > timeout)
   // do any proper error handling
...

Regarding the probability of the conflict happens, you are somehow calling for it, since both operations have common frequency dividers. What happens is, that even if initially both processes start in the middle of the sleep of each other, because of systematic drifting, sooner or later they will encounter each other in the same fraction of a second.

Perhaps it might help a lot, if you would set the reload interval to 13 or 17 and the generation interval to 29 or 31 (13, 17, 29, and 31 are prime numbers).
 
Back
Top