Generated using AI. Be aware that everything might not be accurate.



Chapter 4: Spam Prevention

The Spam Reality

Spam is the inevitable companion of any public comment system. Without effective prevention, your comment sections will fill with pharmaceutical ads, cryptocurrency scams, and SEO link farms. This chapter explores the multi-layered approach needed to keep your comments clean.

Understanding the Enemy

Types of Comment Spam

Automated Bot Spam:

  • Scripts that submit forms automatically
  • High volume, low sophistication
  • Often generic, irrelevant content
  • Usually includes links

Semi-Automated Spam:

  • Humans solving CAPTCHAs
  • Bots handling submission
  • Higher quality than pure automation
  • Harder to detect

Manual Spam:

  • Human spammers posting individually
  • Context-aware content
  • May appear legitimate initially
  • Most difficult to prevent

Coordinated Campaigns:

  • Multiple accounts working together
  • Gradual reputation building
  • Sophisticated evasion tactics
  • Often politically motivated

Spammer Motivations

Understanding why spammers target you helps design defenses:

  • SEO value: Links from your site boost their rankings
  • Direct traffic: Some visitors will click spam links
  • Credential harvesting: Fake links lead to phishing sites
  • Malware distribution: Links to malicious downloads
  • Reputation damage: Competitors or trolls
  • Cryptocurrency/scams: Pump-and-dump, fake investments

Defense in Depth

Effective spam prevention uses multiple layers. No single technique is sufficient.

Layer 1: Submission Barriers

Increase the cost of submitting comments:

Honeypot Fields: Hidden form fields that humans don’t see but bots fill in. Any submission with these fields completed is rejected.

Time-Based Validation:

  • Minimum time between page load and submission
  • Maximum time (extremely long delay suspicious)
  • Bots often submit instantly

JavaScript Requirements: Require JavaScript execution to submit. Many bots don’t execute JavaScript. However, this excludes users without JavaScript.

Form Token Rotation: Generate unique tokens that expire. Prevents replay attacks and forces fresh page loads.

Layer 2: Content Analysis

Examine what’s being submitted:

Link Detection:

  • Count links in comment
  • Check link destinations
  • Flag shortened URLs
  • Block known spam domains

Keyword Filtering:

  • Block known spam keywords
  • Pharmaceutical terms
  • Adult content terms
  • Common scam phrases

Pattern Matching:

  • Excessive capitalization
  • Repetitive characters
  • Known spam patterns
  • Suspicious formatting

Language Analysis:

  • Does comment relate to post content?
  • Grammatical anomalies common in spam
  • Generic praise (“Great post!”)
  • Template-like structure

Layer 3: Behavioral Analysis

Examine how submissions happen:

Rate Limiting:

  • Limit comments per IP per time period
  • Limit comments per user session
  • Limit across the entire site

Velocity Checks:

  • Unusual comment frequency
  • Geographic impossibility (same user, different continents)
  • Suspicious timing patterns

Browser Fingerprinting:

  • Consistent fingerprint across submissions
  • Detect browser automation
  • Note: Privacy implications

Mouse/Keyboard Patterns:

  • Bots often don’t generate natural interaction events
  • Track engagement before submission
  • Note: Can be spoofed

Layer 4: Reputation Systems

Build trust over time:

IP Reputation:

  • Track spam history by IP
  • Use external IP reputation services
  • Note: Shared IPs (VPNs, offices) complicate this

Email Reputation:

  • Disposable email detection
  • Domain age and reputation
  • Previous behavior from email

User Trust Scores:

  • New users require approval
  • Trust increases with good behavior
  • Trust decreases with flags/spam

Layer 5: External Services

Leverage specialized services:

CAPTCHA Systems:

  • reCAPTCHA, hCaptcha, Turnstile
  • Adds friction but effective
  • Accessibility concerns
  • Privacy considerations

Spam Detection APIs:

  • Akismet (WordPress ecosystem)
  • CleanTalk
  • Stop Forum Spam
  • Usually paid services

Email Verification Services:

  • Check if email is valid
  • Detect disposable emails
  • Risk scoring

CAPTCHA Considerations

Types of CAPTCHA

Traditional Image CAPTCHA:

  • Distorted text to read
  • Accessibility nightmare
  • Increasingly solved by AI
  • Poor user experience

Image Selection:

  • “Select all images with X”
  • Better accessibility
  • Still solved by services
  • Moderate user friction

Invisible/Risk-Based:

  • Analyzes behavior, shows challenge only if suspicious
  • Best user experience
  • Requires third-party service
  • Privacy concerns

Proof of Work:

  • Browser computes mathematical problem
  • No user interaction
  • Increases submission cost
  • Energy/battery concerns

CAPTCHA Trade-offs

Pros:

  • Effective against basic bots
  • Well-understood by users
  • Easy to implement (third-party)
  • Adjustable difficulty

Cons:

  • Annoying for legitimate users
  • Accessibility challenges
  • Privacy concerns (tracking)
  • Solved by human farms
  • Doesn’t stop manual spam

CAPTCHA Recommendations

  • Use invisible/risk-based CAPTCHA initially
  • Only show challenges when suspicious
  • Have accessible alternatives
  • Don’t rely solely on CAPTCHA
  • Consider CAPTCHA-free alternatives first

Building Your Spam Score

Combine multiple signals into a spam probability score:

Signal Weighting

Assign points to various signals:

  • Contains links: +20 points
  • New user: +15 points
  • Failed honeypot: +100 points (certain spam)
  • Contains spam keywords: +30 points
  • Submitted too quickly: +25 points
  • From known spam IP: +50 points

Threshold Actions

Based on total score:

  • 0-30: Auto-approve
  • 31-70: Hold for moderation
  • 71+: Auto-reject (or shadow-ban)

Tuning Over Time

  • Track false positives (legitimate comments blocked)
  • Track false negatives (spam that got through)
  • Adjust weights based on your data
  • Different sites need different tuning

Shadow Banning

A technique where spammers don’t know they’re blocked:

How it works:

  • Spam is accepted normally (from spammer’s view)
  • Spammer sees their comment on the page
  • No one else sees it
  • Spammer thinks they’re successful

Advantages:

  • Spammer doesn’t adapt tactics
  • No feedback loop for them
  • Reduces return attempts
  • Satisfying (arguably)

Disadvantages:

  • Ethical concerns
  • Can catch legitimate users
  • Requires per-user view logic
  • Complexity in implementation

Recommendation: Use sparingly and review shadow-banned submissions periodically.

Handling False Positives

Legitimate comments caught as spam:

Prevention

  • Err toward approval for borderline cases
  • Clear feedback when comments held
  • Quick moderation turnaround
  • Whitelist known good users

Recovery

  • Easy appeal process
  • Notification when approved
  • Apology for inconvenience
  • Adjust rules that caused false positive

Spam Prevention Without CAPTCHA

It’s possible to have effective spam prevention without user-facing challenges:

The Invisible Approach

  1. Honeypot fields
  2. Time validation
  3. JavaScript token
  4. Content analysis
  5. Rate limiting
  6. Behavior analysis
  7. Manual moderation for flagged items

This combination catches most automated spam while providing friction-free experience for legitimate users.

Cost Considerations

Free Options

  • Honeypots
  • Time validation
  • Basic keyword filtering
  • Rate limiting
  • Manual moderation

Low-Cost Options

  • Basic CAPTCHA services (free tiers)
  • Simple IP reputation checks
  • Disposable email detection
  • Open-source spam detection
  • Advanced CAPTCHA (reCAPTCHA Enterprise)
  • Spam detection APIs
  • IP reputation services
  • Machine learning solutions

For most small sites, free options combined with manual moderation are sufficient.

Spam Prevention Checklist

  • Implemented honeypot fields
  • Added time-based validation
  • Content analysis for links and keywords
  • Rate limiting in place
  • Considered CAPTCHA (and alternatives)
  • Moderation queue for suspicious comments
  • False positive recovery process
  • Monitoring and metrics
  • Plan for tuning over time

Summary

Effective spam prevention requires:

  1. Multiple layers: No single technique is sufficient
  2. Balance: Don’t sacrifice UX for security
  3. Adaptation: Spammers evolve, so must you
  4. Monitoring: Track what’s working and what’s not
  5. Moderation: Human review remains important

Start with the free, invisible techniques (honeypots, time validation, content analysis). Add visible challenges only if needed. Remember that the goal is minimizing spam while maximizing legitimate participation.

The next chapter covers moderation systems—what happens after a comment passes your spam filters.



>> You can subscribe to my mailing list here for a monthly update. <<