Writing Custom Spam Rules: A Plain-English Guide to Regex Heuristics

When You Need Custom Rules

Indition Spam Killer's built-in heuristics are tuned for general-purpose spam detection. They catch the broad population of junk mail effectively. But there are categories of unwanted mail that are too specific to your domain or industry to be captured by generic rules:

Brand impersonation targeting your company. Phishing campaigns that reference your company name, your CEO, your products, or your client portal by name aren't going to be caught by generic patterns — those details are unique to you.
Industry-specific scams. A law firm gets targeted by "urgent settlement" phishing. A healthcare practice sees fake "patient portal" credential requests. A real estate agency receives wire transfer redirect attempts. These are specialized social engineering patterns invisible to general-purpose filters.
Active campaigns you're currently seeing. When your users start forwarding you examples of a specific phishing campaign — same subject structure, same lure, slightly different sending domains — a targeted custom rule can stop it immediately rather than waiting for the generic pattern database to update.
Internal policy filtering. You might want to flag or block mail containing certain terms for compliance reasons, or add score to messages from specific TLDs with high abuse rates in your sector.

The Configuration Format

Custom rules are defined in YAML under the custom_rules key in your domain configuration. Each rule has five fields:

custom_rules:
  - name: "descriptive-rule-name"
    description: "Human-readable explanation of what this catches"
    target: body          # Options: subject, body, both
    pattern: "regex here"
    score: 0.4            # 0.0 to 1.0; this is added to composite score

The target field determines where the regex is applied. Use subject for subject-line-only patterns, body for patterns in the message body, and both to check across the entire message text (subject and decoded body combined). For most content patterns, both is appropriate.

The score field is a float from 0.0 to 1.0. This value is added to the message's running composite score. The default spam threshold is 0.75 — so a single rule with score 0.8 will flag a message on its own, while a rule with score 0.3 adds weight but requires other signals to push the message over the threshold.

Writing Real Rules: Three Examples

Example 1: Crypto Scam Pattern

Cryptocurrency investment scams follow predictable language patterns. They tend to promise extraordinary returns, reference Bitcoin or other cryptocurrencies by name, and include urgency language around "limited time" offers. Here's a rule that catches a common variant:

- name: "crypto-investment-scam"
  description: "Catches crypto investment return promises"
  target: both
  pattern: "(?i)(bitcoin|crypto|BTC|ethereum).{0,60}(guaranteed|risk.free|daily.return|profit|multiply)"
  score: 0.55

Breaking down the regex: (?i) makes it case-insensitive. The first group matches any of the common crypto terms. .{0,60} allows up to 60 characters between the crypto term and the promise language — enough to catch phrases like "your Bitcoin investment will generate guaranteed daily returns" without requiring exact phrase matching. The second group catches the promise vocabulary.

A score of 0.55 means this rule alone pushes a message most of the way to the spam threshold. Combined with other weak signals (free webmail sender, missing plain-text part), a matching message will be caught reliably.

Example 2: Advance Fee / "Nigerian Prince" Pattern

The advance fee fraud pattern is ancient but persistent, and it often arrives in variations that evade generic filters by changing surface details while keeping the structural language. This rule targets the core offer structure:

- name: "advance-fee-fraud"
  description: "Advance fee fraud / inheritance scam pattern"
  target: body
  pattern: "(?i)(confidential|strictly.confidential).{0,200}(million.dollars?|million.USD|funds?|inheritance|estate).{0,200}(assistance|help|partner|share|percentage|commission)"
  score: 0.65

This catches the three-act structure of an advance fee pitch: the confidential framing, the large sum of money, and the request for the recipient's participation in exchange for a share. The generous .{0,200} spacing tolerates the verbose phrasing these messages typically use. A score of 0.65 is fairly aggressive and reflects that this pattern has essentially no legitimate use cases.

Example 3: Fake Invoice Pattern

Fake invoice scams are a business email compromise technique: the attacker sends a message that looks like a routine accounts payable communication, hoping it will be forwarded to finance without close scrutiny. This rule targets the subject-line pattern these messages use most often:

- name: "fake-invoice-subject"
  description: "Fake invoice / payment request in subject"
  target: subject
  pattern: "(?i)(invoice|payment.request|remittance).{0,30}#?\\d{3,8}"
  score: 0.25

Note the lower score of 0.25. Invoice subjects are legitimately common — this rule adds weight rather than making a determination on its own. The pattern catches "Invoice #8472," "Payment Request 20251203," and similar constructs. When combined with other signals (mismatched Reply-To, sending domain different from claimed company, first-contact message with no thread history), the composite score will push the message into spam range.

Testing Your Rules Before Deploying

Indition Spam Killer includes a rule tester in the admin dashboard. Before activating a new custom rule, paste sample messages into the tester to confirm: (a) the rule matches the messages you intend to catch, and (b) the rule doesn't match legitimate messages you want to allow.

For each rule you write, test it against at least three samples of the spam pattern it targets and three samples of potentially similar legitimate mail. A rule targeting invoice subjects should be verified against your actual AP communications to confirm the pattern doesn't produce false positives on your real supplier invoices.

During the initial deployment period, consider setting a new rule's score lower than its final intended value and monitoring results for a week before raising it. This gives you confidence in the pattern's behavior against real traffic before it starts making final disposition decisions.

Common Regex Pitfalls

A few mistakes that cause custom rules to behave unexpectedly:

Forgetting case insensitivity. Email content arrives in all cases. Always include (?i) at the start of patterns unless you have a specific reason to be case-sensitive.
Greedy quantifiers matching too broadly. .+ matches everything from your first token to the last occurrence of your second token on the line — often an enormous span that produces false matches. Prefer bounded quantifiers like .{0,100} over unbounded ones.
Not escaping dots and special characters. In regex, . matches any character. The pattern risk.free matches "risk-free" and "riskXfree." If you want a literal dot, write risk\.free. This matters most for domain name patterns.
Overly narrow anchoring. Anchoring a pattern with ^ and $ to match the entire message body will never trigger — message bodies are multi-line. Leave anchoring off for body patterns and use it judiciously for subject patterns only when you need exact-start matching.

Keeping Your Rules Maintainable

A custom rule set grows quickly. After six months of adding rules reactively, it's easy to end up with 40 rules where some overlap, some are outdated, and nobody is sure which ones are still doing useful work.

A few practices keep the ruleset healthy: give every rule a meaningful name and a description explaining what campaign or pattern it was written to catch. Review the rule hit statistics monthly — a rule that hasn't triggered in 90 days either isn't needed anymore or covers a pattern that has changed. Keep rules in version control so you can track when each was added and what prompted it. And prefer tighter patterns with moderate scores over loose patterns with high scores; it's better to need two rules working together than to have one over-broad rule causing false positives.