February 11, 20267 min read

Your LLM Bill Is 58% of Your AI Cost. What About the Other 42%?

When people say 'AI cost' they usually mean their OpenAI bill. That's 58% of the picture. Here's what's hiding in the other 42%.

Deborah

CTO at botanu

When people say "AI cost" they usually mean their OpenAI bill. Maybe Anthropic. Maybe Deepgram if they're doing voice. That's 58% of the picture.

We've been instrumenting AI workloads and in a typical customer service stack the cost splits like this: model/LLM costs are $7K (58% of total), infrastructure is $3.2K (26%), and data pipeline is $1.9K (16%). Total $12.1K/month.

The other 42% is the part that scales in ways nobody expects.

Multi-Agent Inference

A customer query doesn't hit one model. It hits an Intent Router (10.6K calls/month, $668), a Knowledge Agent (33.9K calls/month, $1,865), an Action Agent (8.5K calls, $349), and a Response Agent (10.6K calls, $1,272). Four agents. Four cost profiles. Per outcome.

RAG Pipeline

Before the model generates anything there's vector DB queries (920K/month, $598), embedding generation (54M/month, $1,080), and reranking (810K/month, $162). That's $1.8K/month just to retrieve context. Full pipeline cost per outcome is $0.146. The embedding cache saves $520/month. Most teams don't track RAG cost at the outcome level at all. They just see a Pinecone bill.

ETL and Data Infra

Pinecone for vectors ($1,000). Databricks for feature engineering ($420). dbt Cloud for transforms ($125). Snowflake for analytics ($280). Fivetran for sync ($92). These costs exist whether the AI resolves one query or ten thousand.

Cloud Infra

EC2 for the voice gateway ($1,400). Lambda ($880). API Gateway ($420). CloudWatch ($180). S3 for audio storage ($145). NAT Gateway ($280). Reserved instance savings knock off $200 but only if someone's actively managing them.

The Invisible Cost Creep

The pattern we keep seeing: a 10% increase in LLM cost gets noticed immediately. Vector DB queries creeping from 800K to 920K because the RAG pipeline started making 3.2 calls per outcome instead of 2.8? Nobody catches that until the monthly bill review.

The Knowledge Agent being the most expensive agent in the pipeline at $1,865/month because it averages 1,840 tokens per call? That just shows up as "GPT-4o-mini: $4,800" in the OpenAI dashboard.

The dangerous costs aren't the big ones. They're the ones that are growing and invisible.

What Full-Stack Tracking Looks Like

Full-stack tracking means vendor-to-feature mapping (OpenAI's $6K breaks down to $2,400 Voice Agent, $2,700 Chatbot, $900 Escalations), cost per outcome not cost per call ($1.98 through Voice Agent including everything), and cost variance (P50 is $0.82 but P99 is $4.31, a 5x spread, 6.2% of runs cost over $3).

Existing LLM observability tools are great at latency, tokens, cost per model call. But model calls are an ingredient. We've been trying to show the whole meal.

InfrastructureCost Attribution

Track cost per outcome across your AI stack

See how botanu gives engineering teams full visibility into workflow-level unit economics.

Join our waitlist See the product