As spammers have started to focus on dynamic web pages, including wiki pages and other CGI programs, it is becoming more important to track chronic offenders. If you have an experience with web spam, record any relevant information here (perhaps including IP address and date/time of attack). If you are getting hit many times per minute from the same or different IP addresses, consider using IRC to alert administrators to the problem or the HCoop mailing lists. In this case, we may have to take action more quickly to avoid surging over bandwidth limits.
1. WebSpam Ideas
1.1. Wikis
What are some ideas to defeat spam that defaces web pages?
- Well, the first thing that comes to mind is enabling wiki user authentication. It would require a few extra steps in order to contribute to a webpage but could be worthwhile if the level of spam is out of control. You could have open enrollment, meaning everyone that signs up for an account can automatically contribute to the wiki.
You might notice that hardly any spamming attempts succeed on our MoinMoin wikis. That's because MoinMoin has this wonderful content-based filtering support that consults a central database of all known spam content. I think this kind of program-specific support is the way to go. --AdamChlipala
If a wiki is very popular, even MoinMoin's content filtering isn't enough. What I generally recommend is making the FrontPage writable only to registered users, and let other pages be writable by anyone. This at least helps to save face when new visitors come to your site. --MichaelOlson
1.2. Blog comments
I use pyblosxom, which has the option of putting comments in a draft status. The blog maintainer must manually give comments a ".cmt" extension for them to show up. I'm beginning to think that it was a mistake to put a link to a page on my blog called "Guestbook", because I'm now getting about 300 spam comments a day to moderate! Hopefully getting rid of that link will help. --MichaelOlson
Aren't CAPTCHAs enough ? I thought spammers are still to catchup with simple mechanisms like the one here. -- AnilNarayanan
Captchas are a suboptimal solution because they block blind users from participating. --MichaelOlson
- The wikipedia page linked above mentions that audio captchas are possible as well, but they may not be as widely implemented as purely visual ones in the present.
I guess accessibility might not be an issue if just text (question-answer pairs, say) is invloved. It might work well till one's site generates enough interest to have people sit and crack the captchas. -- AnilNarayanan
2. WebSpam Log
The following info indicates spam attempts, gathered by HCoop users:
3. External References
LinkSleeve: a community-based project to block spam on different kinds of dynamic web sites.