bayes-irc - Home

Welcome to bayes-irc

Paul Graham's articles on Bayesian filtering of E-Mails gave the basic idea to set up a similar system for spam filtering in chat clients. His article A plan for spam is a very good entry point for further readings. He also describes an improved algorithm (Better Bayesian Filtering).

The basic idea on Bayesian filtering is that the user himself describes what he thinks is spam. The user tells the system what in his opinion should be filtered out and what he would like to see. The system then can learn to recognize bad parts from the spam example and good parts from the non-spam (or ham) examples. The resulting classifier will then be able to give a certain measure for a given input source.

E-Mail communication is one the fastest communication methods on the internet. Since communication in internet chats is faster than communication via E-Mail, the idea to use the same mechanism to prevent unwanted texts from the chat domain is challenging - but not impossible. If you would like to know how you can achieve this and what you should think about when trying to train a classificator and using this approach in real-time chat environments, you should read our introduction.