Reduce LLM Costs With a Keyword Classifier

The cheapest API call is the one you never make.

That single idea is the most reliable way to cut AI costs, and almost nobody designs for it. The instinct, when you build anything with a language model, is to send the work to the model and let it sort things out. It works. It also bleeds money, because you end up paying a frontier model to recognize a newsletter it has seen ten thousand times.

The better order is the reverse: run a cheap keyword classifier first, and call the language model last, only for the work that actually needs it. I built an email assistant on exactly this principle. It sorts every email in my inbox, files deadlines to the calendar, and drafts replies, and most of that work never touches a model at all.

The short version: A keyword classifier can handle some predictable work without a model API call. The language model receives unmatched or low-confidence cases. The savings depend on match coverage, false-positive cost, model usage, and the review process. Measure precision and recall by category before allowing rules to take irreversible action, and keep a queue for ambiguous work when the model is unavailable.

Run the keyword classifier first

Here is the order that matters. Every email hits a keyword classifier first. Plain rules, no API, no cost, no latency.

A receipt, a calendar invite, a newsletter, a payment alert. These follow patterns, and patterns are cheap to match. The classifier handles the bulk of the inbox before anything expensive runs:

def classify(email):
    """Free, instant, deterministic. Returns a category or None."""
    text = f"{email.subject} {email.sender}".lower()

    if any(k in text for k in ("receipt", "order #", "your invoice")):
        return "receipt"
    if "unsubscribe" in email.body.lower():
        return "newsletter"
    if email.content_type == "text/calendar" or "invitation:" in text:
        return "calendar_invite"
    if any(k in text for k in ("payment", "card ending", "transaction")):
        return "payment_alert"

    return None  # didn't match a known shape, escalate


def handle(email):
    category = classify(email)
    if category is not None:
        return route(category, email)   # no model involved
    return model_triage(email)          # the expensive path, used last

Only when the keywords miss does the model wake up. The email that does not fit a known shape gets sent to a language model for a real read. That is the email worth paying for. Everything else was already sorted for free.

People reach for the model first because it is easy. Send every email to the API, let it decide, move on. It works, and it quietly drains the budget. You are renting a language model to do pattern matching a regular expression does for nothing. The model is not the clever part of that interaction. The clever part is knowing you did not need it.

The cost line every feature should pass through

This is the cost line I run every feature through, and it is two questions long. What is the cheap path, and how much of the work can it carry?

Frame it that way and a model stops being the centerpiece and becomes what it actually is: an exception handler with an excellent vocabulary. The cheap path carries the routine. The expensive tool handles what the routine cannot. Treat the model like the specialist you call in, not the receptionist who greets everyone. The same order pays off in outreach. Cheap rules qualify a lead first, so the expensive touch, a sales call or an AI caller, only fires on the ones worth it.

The table below is an illustrative scenario, not a measured production result. Assume 10,000 emails a day and a classifier that routes 85 percent without a model call:

Approach	Emails sent to the model	Relative model spend
Model on everything	10,000 / day	100% (baseline)
Classifier first, model last	1,500 / day	~15%

That routing would remove roughly five-sixths of model calls. It does not guarantee the same output or capability. Rules can misclassify messages, and the remaining model calls may be longer or more complex. Measure end-to-end spend, precision, recall, false-action cost, and reviewer time against the model-on-everything baseline.

Why a cheap path beats a language model on reliability

There is a reliability dividend on top of the savings, and it is easy to overlook.

The keyword path has no rate limit, no outage, no token bill. When the model is slow, degraded, or fully down, the bulk of the inbox still sorts itself, because the bulk never depended on the model in the first place. You have quarantined your most fragile, most expensive dependency so it sits in front of only the slice of work that genuinely requires it. The blast radius of an API outage shrinks to “ambiguous emails wait a bit,” instead of “the whole assistant is dead.” There is a security dividend in the same move. A rule cannot be talked into obeying a stranger, so the work that never reaches a model has nothing for a hidden instruction to hijack.

And when the model itself returns something off, a second pass catches it. The fallback is not the front door, it is the last resort behind a free path that already did most of the job:

def model_triage(email):
    try:
        return primary_model(email)      # only the hard 15% reaches here
    except (RateLimitError, TimeoutError, APIError):
        return fallback_model(email)     # cheaper/secondary model as backstop

Notice the shape of the whole thing. The free, deterministic path is first and does the most. The expensive model is second and does the rest. The fallback model is third and only exists for the moments the second one stumbles. Cost and fragility increase as you go down the stack, and so does how rarely you reach each layer.

Cheap by default, smart when it has to be

This is not a compromise you make to save a few dollars. It is a cleaner design on every axis that matters. It is cheaper, because most work never hits the meter. It is faster, because the common case skips a network round trip entirely. It is more reliable, because your dependency on a remote model now covers only the cases that truly need one.

Cheap by default, smart when it has to be. That is not a tradeoff. That is the design.

Frequently asked questions

What is the simplest way to cut AI costs in production?

Test whether predictable categories can be handled by keyword rules, regular expressions, or a small local classifier. Reserve the language model for unmatched or low-confidence cases. Savings and quality depend on the workload, so measure both against a baseline.

Won’t a keyword classifier miss edge cases that a model would catch?

That is exactly what the escalation path is for. The classifier only acts when it is confident. Anything it does not recognize falls through to the model. You are not replacing the model, you are making sure it only sees the work that needs its judgment.

Does this add a lot of complexity to the system?

Very little. In most cases it is one function and one branch: if the cheap path matches, handle it there. Otherwise call the model. That is far simpler to reason about than a system where every code path depends on a remote API being fast and available.

How do I decide what goes in the cheap path versus the model?

Ask two questions of every feature: what is the cheapest path that produces a correct result, and how much of the total volume can that path carry? Anything that follows a stable pattern (receipts, invites, alerts, well-structured data) belongs in the cheap path. Anything that requires real comprehension goes to the model.

What about reliability when the model API goes down?

That is one of the biggest wins. Because the cheap path has no dependency on the model, the bulk of the work keeps running during an outage. Only the ambiguous slice is affected, and even that can route to a cheaper fallback model so it degrades gracefully instead of failing outright.

Site navigation

How to Cut AI Costs: Run a Keyword Classifier First, Call the Model Last