Skip to content
Aldridge Dagos Get in touch

N°008 · 2026.05.14 · 6 MIN

How to Cut AI Costs: Run a Keyword Classifier First, Call the Model Last

By Aldridge Dagos


The cheapest API call is the one you never make.

That single idea is the most reliable way to cut AI costs, and almost nobody designs for it. The instinct, when you build anything with a language model, is to send the work to the model and let it sort things out. It works. It also bleeds money, because you end up paying a frontier model to recognize a newsletter it has seen ten thousand times.

The better order is the reverse: run a cheap keyword classifier first, and call the language model last, only for the work that actually needs it. I built an email assistant on exactly this principle. It sorts every email in my inbox, files deadlines to the calendar, and drafts replies, and most of that work never touches a model at all.

The short version: A keyword classifier handles the predictable bulk of the work for free, with no API call, no latency, and no rate limit. The language model only wakes up for the genuinely ambiguous cases. You pay for the exceptions, not the routine, and the system keeps working even when the model is down.

Run the keyword classifier first

Here is the order that matters. Every email hits a keyword classifier first. Plain rules, no API, no cost, no latency.

A receipt, a calendar invite, a newsletter, a payment alert. These follow patterns, and patterns are cheap to match. The classifier handles the bulk of the inbox before anything expensive runs:

def classify(email):
    """Free, instant, deterministic. Returns a category or None."""
    text = f"{email.subject} {email.sender}".lower()

    if any(k in text for k in ("receipt", "order #", "your invoice")):
        return "receipt"
    if "unsubscribe" in email.body.lower():
        return "newsletter"
    if email.content_type == "text/calendar" or "invitation:" in text:
        return "calendar_invite"
    if any(k in text for k in ("payment", "card ending", "transaction")):
        return "payment_alert"

    return None  # didn't match a known shape, escalate


def handle(email):
    category = classify(email)
    if category is not None:
        return route(category, email)   # no model involved
    return model_triage(email)          # the expensive path, used last

Only when the keywords miss does the model wake up. The email that does not fit a known shape gets sent to a language model for a real read. That is the email worth paying for. Everything else was already sorted for free.

People reach for the model first because it is easy. Send every email to the API, let it decide, move on. It works, and it quietly drains the budget. You are renting a language model to do pattern matching a regular expression does for nothing. The model is not the clever part of that interaction. The clever part is knowing you did not need it.

The cost line every feature should pass through

This is the cost line I run every feature through, and it is two questions long. What is the cheap path, and how much of the work can it carry?

Frame it that way and a model stops being the centerpiece and becomes what it actually is: an exception handler with an excellent vocabulary. The cheap path carries the routine. The expensive tool handles what the routine cannot. Treat the model like the specialist you call in, not the receptionist who greets everyone.

The numbers make the case better than any principle. Say you process 10,000 emails a day and a keyword classifier confidently handles 85% of them:

ApproachEmails sent to the modelRelative model spend
Model on everything10,000 / day100% (baseline)
Classifier first, model last1,500 / day~15%

Same output, same drafts, same filed deadlines. You removed roughly five-sixths of the model calls without removing any of the capability. That ratio holds across most real workloads, because most real workloads are mostly routine.

Why a cheap path beats a language model on reliability

There is a reliability dividend on top of the savings, and it is easy to overlook.

The keyword path has no rate limit, no outage, no token bill. When the model is slow, degraded, or fully down, the bulk of the inbox still sorts itself, because the bulk never depended on the model in the first place. You have quarantined your most fragile, most expensive dependency so it sits in front of only the slice of work that genuinely requires it. The blast radius of an API outage shrinks to “ambiguous emails wait a bit,” instead of “the whole assistant is dead.”

And when the model itself returns something off, a second pass catches it. The fallback is not the front door, it is the last resort behind a free path that already did most of the job:

def model_triage(email):
    try:
        return primary_model(email)      # only the hard 15% reaches here
    except (RateLimitError, TimeoutError, APIError):
        return fallback_model(email)     # cheaper/secondary model as backstop

Notice the shape of the whole thing. The free, deterministic path is first and does the most. The expensive model is second and does the rest. The fallback model is third and only exists for the moments the second one stumbles. Cost and fragility increase as you go down the stack, and so does how rarely you reach each layer.

Cheap by default, smart when it has to be

This is not a compromise you make to save a few dollars. It is a cleaner design on every axis that matters. It is cheaper, because most work never hits the meter. It is faster, because the common case skips a network round trip entirely. It is more reliable, because your dependency on a remote model now covers only the cases that truly need one.

Cheap by default, smart when it has to be. That is not a tradeoff. That is the design.

Frequently asked questions

What is the simplest way to cut AI costs in production?

Stop sending every request to the model. Put a cheap, deterministic classifier (keyword rules, regex, or a small local model) in front of it to handle the predictable majority of cases, and reserve the language model for the genuinely ambiguous ones. You usually remove the large majority of API calls without losing any capability.

Won’t a keyword classifier miss edge cases that a model would catch?

That is exactly what the escalation path is for. The classifier only acts when it is confident; anything it does not recognize falls through to the model. You are not replacing the model, you are making sure it only sees the work that needs its judgment.

Does this add a lot of complexity to the system?

Very little. In most cases it is one function and one branch: if the cheap path matches, handle it there; otherwise call the model. That is far simpler to reason about than a system where every code path depends on a remote API being fast and available.

How do I decide what goes in the cheap path versus the model?

Ask two questions of every feature: what is the cheapest path that produces a correct result, and how much of the total volume can that path carry? Anything that follows a stable pattern (receipts, invites, alerts, well-structured data) belongs in the cheap path. Anything that requires real comprehension goes to the model.

What about reliability when the model API goes down?

That is one of the biggest wins. Because the cheap path has no dependency on the model, the bulk of the work keeps running during an outage. Only the ambiguous slice is affected, and even that can route to a cheaper fallback model so it degrades gracefully instead of failing outright.