Hello all, forgive me if this is not the correct place for this but I'd rather contribute content to this forum and not sign up for another. I have some questions for anybody experienced with SA Bayesian Filter training.
I have two mail servers. One of them is in production and the other I am developing. Soon I will be swapping the development with the production server. So my dev server will eventually take the hostname, domain, and IP of the production server. I work under policy that prohibits working on production servers as much as possible so I want to train SpamAssassin before going to production but I am worried that the filter may be skewed because I am taking mail from another server, although the mail stream is the same.
I am thinking of having SA Learn ignore most of the headers (maybe all headers now that I look at them) to avoid any of that data effecting the filter since headers in my spam and ham are similar.
Also, both servers are set up to add ******SPAM****** to the beginning of the subject line. I read that SA-Learn would account for and ignore this but when I use mail from another server, although SA is set up the same? I don't know?
So if I ignore headers and SA-Learn accounts for the ******SPAM****** tag, and I take mail from my production server, have SA-Learn go through it on the dev server, and then put the dev server into production; will I have any problems with the Bayesian filter?
One final concern I have. If I train the filter with ~500 spam, ~500 ham, disable auto-learning, and then never really re-train it, will Bayes still be effective? I know it won't get good at identifying the new spam - but it should still do okay with the spam it has already learned, right?
Thanks for any help in advance.
I have two mail servers. One of them is in production and the other I am developing. Soon I will be swapping the development with the production server. So my dev server will eventually take the hostname, domain, and IP of the production server. I work under policy that prohibits working on production servers as much as possible so I want to train SpamAssassin before going to production but I am worried that the filter may be skewed because I am taking mail from another server, although the mail stream is the same.
I am thinking of having SA Learn ignore most of the headers (maybe all headers now that I look at them) to avoid any of that data effecting the filter since headers in my spam and ham are similar.
Also, both servers are set up to add ******SPAM****** to the beginning of the subject line. I read that SA-Learn would account for and ignore this but when I use mail from another server, although SA is set up the same? I don't know?
So if I ignore headers and SA-Learn accounts for the ******SPAM****** tag, and I take mail from my production server, have SA-Learn go through it on the dev server, and then put the dev server into production; will I have any problems with the Bayesian filter?
One final concern I have. If I train the filter with ~500 spam, ~500 ham, disable auto-learning, and then never really re-train it, will Bayes still be effective? I know it won't get good at identifying the new spam - but it should still do okay with the spam it has already learned, right?
Thanks for any help in advance.