Simple Moderation
← All posts·Comparison··12 min read

OpenAI Moderation API vs Custom-Rule LLM Moderation: When to Use Which

OpenAI's free Moderation API ships in three minutes. Custom-rule LLM moderation takes thirty. Here's the honest trade-off, with cost math.

OpenAI’s Moderation API is free, fast, and well-engineered. We’re a paid product. Why would anyone pick us?

This isn’t a hit piece. The OpenAI Moderation API is genuinely good at what it does. But what it does is not always what you need. Here’s the honest trade-off.

What OpenAI Moderation actually returns

OpenAI’s API takes content (text or, since the omni model, images) and returns a fixed list of category scores. The categories are baked into the model:

  • sexual
  • sexual / minors
  • harassment
  • harassment / threatening
  • hate
  • hate / threatening
  • illicit
  • illicit / violent
  • self-harm
  • self-harm / intent
  • self-harm / instructions
  • violence
  • violence / graphic

For each category, you get a score from 0 to 1 plus a boolean flag. That’s it. The API is generous on volume (no aggressive rate limits), low-latency, and free.

When OpenAI Moderation is the right answer

If your moderation policy is some subset of those 13 categories, you should just use OpenAI’s API and move on. Specifically:

  • You’re wrapping an LLM and want a safety net on inputs / outputs.
  • You run a forum and want to block obvious hate / harassment / sexual content.
  • You’re a hobby project that doesn’t want to pay for moderation.

For these, the categories map onto your policy 1:1. You don’t need anything fancier.

When OpenAI Moderation falls short

Almost every business policy we’ve seen in the wild does not map 1:1 onto those 13 categories. A few real examples from our pilots:

  • Marketplaces: “No phone numbers, no Telegram / WhatsApp handles, no external profile links.” None of OpenAI’s categories cover this.
  • Dating apps: “No links to monetisation platforms, no underage indicators, no escort solicitation.” Partial overlap with sexual but you’d need to tune thresholds and add your own logic.
  • Customer support: “Flag legal threats for human review.” OpenAI returns 0 across the board for “I’ll sue your company into the ground” — it’s not violent, not harassment, just litigious.
  • AI chat products: “Block prompt injection attempts.” Out of scope entirely.
  • Brand safety: “No mentions of our competitors in support replies.” Obviously not a thing OpenAI tracks.

The pattern: any rule that references your specific business falls outside the 13 categories, and you end up writing glue.

The glue is the hidden cost

What does the glue look like in practice? Something like:

// pull OpenAI categories
const m = await openai.moderations.create({ input: text });
const sexual = m.results[0].category_scores.sexual;
const harassment = m.results[0].category_scores.harassment;

// our own regex + heuristics
const hasPhone = /\+?\d[\d\s-()]{8,}/.test(text);
const hasTelegram = /\b(telegram|t\.me\/)/i.test(text);
const hasEmail = /[\w.+-]+@[\w-]+\.[\w.-]+/.test(text);

// custom decision logic
if (hasPhone || hasTelegram || hasEmail) return "block_contact_leak";
if (sexual > 0.7) return "block_sexual";
if (harassment > 0.8) return "block_harassment";
// ...

This code is fine. It also rots fast. Every product change adds another regex. Every false positive adds a special case. Every “allow this kind of profanity but not that one” turns into a state machine. After two years it’s a 1,500-line file owned by whoever last touched it.

Custom-rule LLM moderation: what changes

Custom-rule LLM moderation collapses that file into a list of sentences:

- No phone numbers in listings
- No external chat handles (WhatsApp, Telegram, Signal)
- No emails or URLs to off-platform profiles
- Block explicit sexual content
- Block threats and harassment
- Allow profanity if descriptive, not directed at a person

You send the content and the rule set name; you get back a decision plus a per-rule breakdown. When the policy changes, you edit the sentence. Engineering isn’t in the loop.

The cost math, honestly

A frontier LLM call costs roughly 100× more per decision than OpenAI’s free Moderation API. So for any product processing millions of items, the inference bill is real money. There are three things to consider:

  1. Pre-filtering. A well-built custom-rule system runs cheap rules first (regex, small classifiers) and only calls the big model when those don’t resolve. For most workloads, 60–80% of decisions never hit the LLM.
  2. Engineering glue. Reclaim the 200+ hours a year your team spends gluing OpenAI categories onto business policy. That’s worth roughly $30k at fully-loaded engineer rates.
  3. Incident cost. One marketplace fraud incident from a missed contact-info leak can dwarf a year of moderation spend.

For most B2B SaaS products processing 50k–5M items per month, custom-rule moderation comes out cheaper end-to-end. Below that, OpenAI’s free API is genuinely fine. Above 5M, the crossover depends heavily on how much policy churn you have and how strict your latency budget is.

Use both

The best architectures we see use both: OpenAI Moderation as a fast, free pre-filter for the obvious categories (sexual, violence, hate), and a custom-rule LLM moderator like Simple Moderation for the business-specific rules. You get the latency and cost benefits of the classifier on the bulk of traffic, and the expressive power of LLMs where you actually need it.

That’s the case for paying us, in one paragraph: we’re not competing with OpenAI Moderation. We’re the thing you reach for when OpenAI Moderation runs out.

Try Simple Moderation free · See 25 real moderation rules

Try Simple Moderation

1,000 decisions free. No card. Ship in an afternoon.

Start free