Chapter 4: Spam Prevention

The Spam Reality

Spam is the inevitable companion of any public comment system. Without effective prevention, your comment sections will fill with pharmaceutical ads, cryptocurrency scams, and SEO link farms. This chapter explores the multi-layered approach needed to keep your comments clean.

Understanding the Enemy

Types of Comment Spam

Automated Bot Spam:

Scripts that submit forms automatically
High volume, low sophistication
Often generic, irrelevant content
Usually includes links

Semi-Automated Spam:

Humans solving CAPTCHAs
Bots handling submission
Higher quality than pure automation
Harder to detect

Manual Spam:

Human spammers posting individually
Context-aware content
May appear legitimate initially
Most difficult to prevent

Coordinated Campaigns:

Multiple accounts working together
Gradual reputation building
Sophisticated evasion tactics
Often politically motivated

Spammer Motivations

Understanding why spammers target you helps design defenses:

SEO value: Links from your site boost their rankings
Direct traffic: Some visitors will click spam links
Credential harvesting: Fake links lead to phishing sites
Malware distribution: Links to malicious downloads
Reputation damage: Competitors or trolls
Cryptocurrency/scams: Pump-and-dump, fake investments

Defense in Depth

Effective spam prevention uses multiple layers. No single technique is sufficient.

Layer 1: Submission Barriers

Increase the cost of submitting comments:

Honeypot Fields: Hidden form fields that humans don’t see but bots fill in. Any submission with these fields completed is rejected.

Time-Based Validation:

Minimum time between page load and submission
Maximum time (extremely long delay suspicious)
Bots often submit instantly

JavaScript Requirements: Require JavaScript execution to submit. Many bots don’t execute JavaScript. However, this excludes users without JavaScript.

Form Token Rotation: Generate unique tokens that expire. Prevents replay attacks and forces fresh page loads.

Layer 2: Content Analysis

Examine what’s being submitted:

Link Detection:

Count links in comment
Check link destinations
Flag shortened URLs
Block known spam domains

Keyword Filtering:

Block known spam keywords
Pharmaceutical terms
Adult content terms
Common scam phrases

Pattern Matching:

Excessive capitalization
Repetitive characters
Known spam patterns
Suspicious formatting

Language Analysis:

Does comment relate to post content?
Grammatical anomalies common in spam
Generic praise (“Great post!”)
Template-like structure

Layer 3: Behavioral Analysis

Examine how submissions happen:

Rate Limiting:

Limit comments per IP per time period
Limit comments per user session
Limit across the entire site

Velocity Checks:

Unusual comment frequency
Geographic impossibility (same user, different continents)
Suspicious timing patterns

Browser Fingerprinting:

Consistent fingerprint across submissions
Detect browser automation
Note: Privacy implications

Mouse/Keyboard Patterns:

Bots often don’t generate natural interaction events
Track engagement before submission
Note: Can be spoofed

Layer 4: Reputation Systems

Build trust over time:

IP Reputation:

Track spam history by IP
Use external IP reputation services
Note: Shared IPs (VPNs, offices) complicate this

Email Reputation:

Disposable email detection
Domain age and reputation
Previous behavior from email

User Trust Scores:

New users require approval
Trust increases with good behavior
Trust decreases with flags/spam

Layer 5: External Services

Leverage specialized services:

CAPTCHA Systems:

reCAPTCHA, hCaptcha, Turnstile
Adds friction but effective
Accessibility concerns
Privacy considerations

Spam Detection APIs:

Akismet (WordPress ecosystem)
CleanTalk
Stop Forum Spam
Usually paid services

Email Verification Services:

Check if email is valid
Detect disposable emails
Risk scoring

CAPTCHA Considerations

Types of CAPTCHA

Traditional Image CAPTCHA:

Distorted text to read
Accessibility nightmare
Increasingly solved by AI
Poor user experience

Image Selection:

“Select all images with X”
Better accessibility
Still solved by services
Moderate user friction

Invisible/Risk-Based:

Analyzes behavior, shows challenge only if suspicious
Best user experience
Requires third-party service
Privacy concerns

Proof of Work:

Browser computes mathematical problem
No user interaction
Increases submission cost
Energy/battery concerns

CAPTCHA Trade-offs

Pros:

Effective against basic bots
Well-understood by users
Easy to implement (third-party)
Adjustable difficulty

Cons:

Annoying for legitimate users
Accessibility challenges
Privacy concerns (tracking)
Solved by human farms
Doesn’t stop manual spam

CAPTCHA Recommendations

Use invisible/risk-based CAPTCHA initially
Only show challenges when suspicious
Have accessible alternatives
Don’t rely solely on CAPTCHA
Consider CAPTCHA-free alternatives first

Building Your Spam Score

Combine multiple signals into a spam probability score:

Signal Weighting

Assign points to various signals:

Contains links: +20 points
New user: +15 points
Failed honeypot: +100 points (certain spam)
Contains spam keywords: +30 points
Submitted too quickly: +25 points
From known spam IP: +50 points

Threshold Actions

Based on total score:

0-30: Auto-approve
31-70: Hold for moderation
71+: Auto-reject (or shadow-ban)

Tuning Over Time

Track false positives (legitimate comments blocked)
Track false negatives (spam that got through)
Adjust weights based on your data
Different sites need different tuning

Shadow Banning

A technique where spammers don’t know they’re blocked:

How it works:

Spam is accepted normally (from spammer’s view)
Spammer sees their comment on the page
No one else sees it
Spammer thinks they’re successful

Advantages:

Spammer doesn’t adapt tactics
No feedback loop for them
Reduces return attempts
Satisfying (arguably)

Disadvantages:

Ethical concerns
Can catch legitimate users
Requires per-user view logic
Complexity in implementation

Recommendation: Use sparingly and review shadow-banned submissions periodically.

Handling False Positives

Legitimate comments caught as spam:

Prevention

Err toward approval for borderline cases
Clear feedback when comments held
Quick moderation turnaround
Whitelist known good users

Recovery

Easy appeal process
Notification when approved
Apology for inconvenience
Adjust rules that caused false positive

Spam Prevention Without CAPTCHA

It’s possible to have effective spam prevention without user-facing challenges:

The Invisible Approach

Honeypot fields
Time validation
JavaScript token
Content analysis
Rate limiting
Behavior analysis
Manual moderation for flagged items

This combination catches most automated spam while providing friction-free experience for legitimate users.

Cost Considerations

Free Options

Honeypots
Time validation
Basic keyword filtering
Rate limiting
Manual moderation

Low-Cost Options

Basic CAPTCHA services (free tiers)
Simple IP reputation checks
Disposable email detection
Open-source spam detection

Paid Services

Advanced CAPTCHA (reCAPTCHA Enterprise)
Spam detection APIs
IP reputation services
Machine learning solutions

For most small sites, free options combined with manual moderation are sufficient.

Spam Prevention Checklist

Implemented honeypot fields
Added time-based validation
Content analysis for links and keywords
Rate limiting in place
Considered CAPTCHA (and alternatives)
Moderation queue for suspicious comments
False positive recovery process
Monitoring and metrics
Plan for tuning over time

Summary

Effective spam prevention requires:

Multiple layers: No single technique is sufficient
Balance: Don’t sacrifice UX for security
Adaptation: Spammers evolve, so must you
Monitoring: Track what’s working and what’s not
Moderation: Human review remains important

Start with the free, invisible techniques (honeypots, time validation, content analysis). Add visible challenges only if needed. Remember that the goal is minimizing spam while maximizing legitimate participation.

The next chapter covers moderation systems—what happens after a comment passes your spam filters.

Gaëlle Candel