spamd (spam trapping) crashes

I am running FreeBSD 7.2-STABLE with spamd from openbsd. I use this ofcourse for spam trapping as in greylisting etc.

I notice that sometimes spamd crashes which cause unknown smtp connections to be rejected. After watching the spamd.log for a while i know what error is responsible for this behaviour. Every time the following error pops up in the log, spamd exits.

<snip>
greyreader failed (No such file or directory)
</snip>

I have still no clue what to do with this error. After reading the source i see that this "greyreader" is a function. I am not that good in reading "C" so can anybody help with this one?

Thanks in advance.
 
ironmikie said:
I am running FreeBSD 7.2-STABLE with spamd from openbsd. I use this ofcourse for spam trapping as in greylisting etc.

I notice that sometimes spamd crashes which cause unknown smtp connections to be rejected. After watching the spamd.log for a while i know what error is responsible for this behaviour. Every time the following error pops up in the log, spamd exits.

<snip>
greyreader failed (No such file or directory)
</snip>

I have still no clue what to do with this error. After reading the source i see that this "greyreader" is a function. I am not that good in reading "C" so can anybody help with this one?

Thanks in advance.

Do you run CPANEL or something else?

If something send a signal to spamd you will see the a message like the one you notice.
Pls. build spamd with parm -DCPANEL, the resulting binary will be then obspamd and not spamd.

You can run the port without redirecting the traffic to spamd a view days to see if the process dies again
Pls. test this way with and without build param -DCPANEL, if the process dies again open a PR.
 
Yes, have the same problem. obspamd died unexpectedly almost every hour
Code:
FreeBSD 8.2-Stable

gate ~# pfctl -t spamd-white -T show | wc -l
      41 
gate ~# pfctl -t blocked -T show | wc -l
   95413
May be obspamd can't work with large pf tables?

BTW, I haven't any CPANEL, no typo errors in obspamd_flag= and etc.

How to debug problem?
Any ideas?
 
Hm, almost every hour sounds like spamd gets a signal from the outside.

The largest spamdb I've seen was ~900MB with a view hundred thousand entries on a 8.x x64 machine.

Do you have any log entries?
Can you check if there is something running every hour (for example cron jobs, spamd-setup, ...)
 
Not exactly every hours. Correlation with cron job I can't find. Here yesterday statistics from daemon:
Code:
2011-12-01 03:17:00 The process obspamd is dead (63508).
2011-12-01 03:17:03 The process obspamd is running (70757).
2011-12-01 04:36:00 The process obspamd is dead (70757).
2011-12-01 04:36:02 The process obspamd is running (72229).
2011-12-01 05:41:00 The process obspamd is dead (72229).
2011-12-01 05:41:02 The process obspamd is running (73479).
2011-12-01 07:10:00 The process obspamd is dead (73479).
2011-12-01 07:10:02 The process obspamd is running (75305).
2011-12-01 09:03:00 The process obspamd is dead (75305).
2011-12-01 09:03:02 The process obspamd is running (77987).
2011-12-01 10:17:00 The process obspamd is dead (77987).
2011-12-01 10:17:03 The process obspamd is running (82077).
2011-12-01 10:47:00 The process obspamd is dead (82077).
2011-12-01 10:47:02 The process obspamd is running (83502).
2011-12-01 11:20:00 The process obspamd is dead (83502).
2011-12-01 11:20:02 The process obspamd is running (85314).
2011-12-01 16:04:00 The process obspamd is dead (95001).
2011-12-01 16:04:02 The process obspamd is running (9791).
2011-12-01 17:03:00 The process obspamd is dead (9791).
2011-12-01 17:03:03 The process obspamd is running (15587).
2011-12-01 18:02:00 The process obspamd is dead (15587).
2011-12-01 18:02:02 The process obspamd is running (20956).
2011-12-01 19:17:00 The process obspamd is dead (20956).
2011-12-01 19:17:02 The process obspamd is running (27318).

cron check service availability every minute.
 
In messages

Code:
Dec  1 09:02:11 orion kernel: pid 75305 (spamd), uid 132:[color="Red"] exited on signal 11[/color]
Dec  1 09:02:11 orion spamd[75308]: greyreader failed (No such file or directory)
Dec  1 09:03:00 orion spamd[77987]: listening for incoming connections.
 
I think I used to have similar problem, when I was running my server.
But it crashed perhaps once a week or few times a month most.

(I was using ipfw)
 
Code:
2011-12-01 18:02:00 The process obspamd is dead (15587).
2011-12-01 18:02:02 The process obspamd is running (20956).
2011-12-01 19:17:00 The process obspamd is dead (20956).
2011-12-01 19:17:02 The process obspamd is running (27318).

Hm the timing between fail and run again are really short.
Do you monitor maybe the proctitle which change at the time the (internal) greyreader process runs?


val said:
In messages

Code:
Dec  1 09:02:11 orion kernel: pid 75305 (spamd), uid 132:[color="Red"] exited on signal 11[/color]
Dec  1 09:02:11 orion spamd[75308]: greyreader failed (No such file or directory)
Dec  1 09:03:00 orion spamd[77987]: listening for incoming connections.

Can you try to start with a fresh spamdb? If greyreader fails, it is mostly a broken pipe or a corrupt database.
 
ohauer said:
Hm the timing between fail and run again are really short.
This one because task in cron running every minute and restarting died daemon.

ohauer said:
Do you monitor maybe the proctitle which change at the time the (internal) greyreader process runs?
Yes, I'm ready, but no have vaild ideas how to implemet this.


ohauer said:
Can you try to start with a fresh spamdb? If greyreader fails, it is mostly a broken pipe or a corrupt database.
No success.
 
Ok, I think was memory corruption in spamd.c module apprx in this place or after:
Code:
syslog_r(LOG_DEBUG, ... "Body: %s", cp->addr, p);
may be due to compiling flags or ..?
PS If spamd compiled with debug flag error doesn't occur.
 
val said:
Ok, I think was memory corruption in spamd.c module apprx in this place or after:
Code:
syslog_r(LOG_DEBUG, ... "Body: %s", cp->addr, p);
may be due to compiling flags or ..?
PS If spamd compiled with debug flag error doesn't occur.

This sounds strange ... (never happened to me)

Have you used non-default optimization flags before?
 
Problem occur if used compiler optimization flag -O2.
Snip from my make.conf:
Code:
CPUTYPE?=pentium3
CFLAGS=-O2 -pipe
COPTFLAGS=-O2 -pipe

If used such make.conf conf, signal 11 doesn't occur
Code:
CPUTYPE?=pentium3
CFLAGS=-O -pipe
COPTFLAGS=-O2 -pipe
 
val said:
Problem occur if used compiler optimization flag -O2.
Snip from my make.conf:
Code:
CPUTYPE?=pentium3
CFLAGS=-O2 -pipe
COPTFLAGS=-O2 -pipe

If used such make.conf conf, signal 11 doesn't occur
Code:
CPUTYPE?=pentium3
CFLAGS=-O -pipe
COPTFLAGS=-O2 -pipe

If this works I'll be contacting the maintainer. Also, there needs to be a proper knob for renaming the binary to obspamd
 
Those flags should go into the Makefile of the port, setting them unconditionally in /etc/make.conf is an error. Setting CPUTYPE is also an error unless you're cross compiling for different machine with a different type of CPU.
 
Compiling with CFLAGS=-O -pipe in the port's Makefile did not help. I had my crash again last night. I haven't tested with debug yet, but that seems a bit extreme.
 
I am facing the problem that obspamd-update -b will crash my FreeBSD 9.0-p3 AMD64 completely. It will load 170.000 or more IPs into a local table.

Might this issue be related?
 
Back
Top