flowcleaner issue

Hi my friends I have a big issue that I still cannot track what is causing that my server stop working.

I have a spam gateway running:

spamassassin
clamavis
amavis
apache+mailgraph
postfix
bacula client
apcupsd client

My server is running freebsd 8.0-p2 AMD64. Raid-1.

Code:
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-RELEASE-p2 #5: Mon May 10 23:23:20 PDT 2010
    [email]root@filter.oakwest.com[/email]:/usr/obj/usr/src/sys/SPAMKER
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU           E3110  @ 3.00GHz (2999.68-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x1067a  Stepping = 10
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x408e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,XSAVE>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 4119261184 (3928 MB)
ACPI APIC Table: <090208 APIC1432>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1

For some reason that I still don't know, amavisd stop working, the port no longer answer any inbound, postfix, spamd, clamd works, amavisd is the issue normally.

My server can work for a month, each weekend I update the ports and upgrade my packages, spamd rules every month are updated.

No issue here, suddenly after a month or 4-5 weeks the server stop working, amavisd stop doing the job, I can access the server via ssh and see the issue.

This is the 4th time that happens.

My IDLE goes to 50%, when I start debugging I see a process called "FlowCleaner" eating 1 CPU.

Always got the PID 20, looks like a internal process.

I cannot kill this process, I can restart clamd, spamd, apache, bacula, but amavisd is stuck.

If I send a [cmd=]shutdown -r now[/cmd] won't work, my server is hang-up, if I press the power button won't work to.

I have to do a cold reboot, I don't have other choice.

I have read logs but don't see any issue (maillog, console, all, message).

One thing to mention, is that every time this happen, my server fan's are working more faster that usual, the first day I detect this I create a batch that send every 5 minutes the core temperature using freebsd coretemp.ko module.

Avg. I got 40'c for each core:

Code:
dev.cpu.0.temperature: 42.0C
dev.cpu.1.temperature: 41.0C

I search around on Intel site and this chip max temp. is 70'c, I had never seen this number on my emails, the biggest number had been 54'c and this appear before my server got crazy yesterday:

Code:
dev.cpu.0.temperature: 53.0C
dev.cpu.1.temperature: 49.0C

Something is causing this, a spam attack, memory leak mmm I had not seen my server use the whole 4GB ever.

My friends, what do u you recommend to me to track this?

Any input will be very appreciated, I got 1 month to track this, thanks!
 
After some research, I have found that exist a bug with FreeBSD 8, it's more related to routers but I have the same behavior:

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/146792

I had disabled the flowtable with sysctl:

Code:
net.inet.flowtable.enable: 0

Now I need to wait and see what happens, I will remove the line from my kernel file and rebuild my kernel:

Code:
options        FLOWTABLE               # per-cpu routing cache

I will let you know, thanks!
 
Hello ,

Yes, recompiling kernel without flowtable resolves issue. I can confirm same behaviour (50% CPU idle) on 8-STABLE (installed on 11.09.2010) with BGP full view (~320k routes), CPU load was 3-4 (normal for my box is 0-1), but cannot confirm how long the server/program was alive, because when I saw this behaviour I recompiled the kernel without flowtable, and now it is OK.

Code:
FreeBSD ( hidden ) 8.1-STABLE FreeBSD 8.1-STABLE #14: Sat Sep 11 15:49:17 EEST 2010     
bobi@( hidden ):/usr/obj/usr/src/sys/GENERIC  amd64

Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) II X3 435 Processor (2913.10-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f52  Family = 10  Model = 5  Stepping = 2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
  TSC: P-state invariant
real memory  = 2147483648 (2048 MB)
avail memory = 2021892096 (1928 MB)
ACPI APIC Table: <042110 APIC1034>
FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs
FreeBSD/SMP: 1 package(s) x 3 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 
In some cases your server can go into unresponsive mode aka live lock after few days if FlowCleaner process continues to eat cpu.
 
Hi my friends.

I had recompiled my kernel and load the new kernel without the flowtables, now I have to wait a while and see if I have finally fix this issue.

Thanks all for your time and input!!!
 
Have been 22 days without a issue, looks like flow tables was causing the problem. My spam server is working as desire:

Code:
9:44AM  up 22 days,  9:28, 2 users, load averages: 0.00, 0.01, 0.00

Thanks to all.
Thanks
 
Back
Top