Why Are Developers Burning Through AI Tokens So Fast?
There's a thing happening right now that I keep seeing in developer communities. People call it token anxiety.
Developers on Claude Max 5x plans burn through their allocation in a few hours. People on the 20x plan hit the wall by mid-afternoon. I've seen posts from people who say they spend more time worrying about running out than actually building.
I've been shipping with Claude Code daily for months. I built ShipUI (17 themes, a CLI, a component registry, a dashboard, telemetry, Stripe integration, landing pages, blog posts) and I've never hit my limits. Not once. Same plan. Same model. Same 24 hours.
I see people running 16 Claude Code sessions in parallel, farming out tasks across multiple terminals. I'm genuinely curious how they're verifying all of that output. For me, I can't test that much at once. Everything that gets built has to get checked. If code is being generated faster than it can be reviewed, I'm not sure that's actually shipping faster.
I'm not claiming this is the right way to work with AI. It's just the workflow that evolved for me after a lot of trial and error. AI is still new, and we all have our own ways of doing things. I've seen people claim a lot of success with different approaches, but I can only verify what I'm doing. I've never prompted for someone else or watched them work, so I have no idea what their sessions actually look like. All I know is that there seems to be a logical way to approach this, and if you can build repeatable patterns into your workflow, you're going to have a better time. I'm 100% convinced of that. But maybe someone else is doing things better. I'd love to hear about it.
Everybody's workflow is different, and what works for me might not apply to someone building a different kind of product. But I have some guesses about what might be contributing to the gap, and I'm genuinely curious what other developers think.
The "just do it for me" trap
One pattern that seems to burn a lot of tokens is treating AI like a vending machine. Paste in a vague prompt, get back 500 lines, realize it's wrong, paste in the whole thing again with corrections, get back another 500 lines.
Every round trip burns tokens on both sides. The input (your prompt plus all the context) and the output (the full response). Do that ten times and you've used more tokens than someone who got it right in two passes.
I don't think this is about being smarter. I think it's about being more specific up front.
One thing I've learned the hard way: context management matters more than I expected
When I work with Claude Code, I'm deliberate about what's in context. I don't dump entire codebases and say "fix it." I point at specific files. I describe what I want in terms of the system I've already built. I reference existing patterns.
This isn't a skill I was born with. It's just something you learn after burning tokens on vague prompts enough times. The tighter your context, the fewer round trips you need, the less you burn.
System knowledge compounds
Here's the thing nobody talks about. If you understand the system you're building, you can describe what you want precisely. If you don't understand it, you end up in a conversation where you're learning and building at the same time. That's expensive.
I don't mean you need to be an expert in everything. I mean knowing your file structure. Knowing your data flow. Knowing what component handles what. When you can say "update the resolver in lib/registry-resolve.ts to use the canonical slug from theme.detailsUrl" instead of "fix the auth thing," you save a massive number of tokens. I wrote more about this in Structuring Repositories for AI Coding Tools.
The CLAUDE.md effect
One pattern that's worked well for me is maintaining a CLAUDE.md at the project root. It tells the AI what the project is, how it's structured, what conventions to follow, what not to do. Every conversation starts with that context already loaded. I covered this idea in more detail in Project Structure Is the Prompt.
Without it, you spend tokens re-explaining your project every session. With it, the AI already knows the rules. That's not a small difference when you're working 8 hours a day.
The safety nets matter
I try to be intentional about this. I'm not going to pretend I write perfect tests for everything, but I do what I can. Linting rules, unit tests where they make sense, making sure the build passes before anything gets merged. Git commit hooks catch a lot before code even gets pushed.
I use CodeRabbit for PR reviews. That's another layer. Between linting, type checking, build verification, and automated code review, there are a lot of safety nets that can catch problems before they become problems.
Security is a big one. AI is trained on code that could have vulnerabilities. If you're generating code and not running any kind of security audit, you might be shipping with open doors you don't even know about. SQL injection, exposed admin routes, leaked credentials. It's not going to be perfect, but running a vulnerability scanner catches things that most people wouldn't think to look for manually.
I know a lot of developers are looking for free tools for everything, and AI is making some of that easier. But personally, I'm comfortable paying for established security tools from companies that have been doing vulnerability scanning for 20 years. Everyone has different experiences with this, and I'm open to hearing what's working for other people.
The point is: you can tell AI to follow security best practices, and it mostly will. But you can't catch everything if you're not looking at the code yourself. Put the guardrails where you can. Linting, type safety, automated security scans, code review. Then the AI-generated code goes through the same gauntlet as anything else. I go deeper into this in Preventing AI Code Hallucinations and My AI Debugging Workflow.
Repeatability saves more than you think
The biggest thing I've learned is this: if you can make a workflow repeatable, you eliminate most of the token burn.
Every time I add a new theme to ShipUI, it follows the same checklist. Scaffold the theme, build the demo, upload to S3, create Stripe products, update the landing page, write the blog post, set env vars. I've turned that into scripts and agent prompts that handle 80 to 90% of the work. The AI isn't figuring it out from scratch each time. It's following a pattern it already knows.
I know it's tempting to just do things ad hoc. It feels faster in the moment. But if you're going to do something more than twice, invest the time to make it repeatable. Build a script. Write a prompt template. Create a skill. Whatever your tool supports.
One thing I've noticed is that when the same type of task is approached differently each time, token usage tends to grow quickly. That's expensive. Not just in tokens, but in your time. And honestly, saving time is the most important thing you can do as a developer. That's the real cost. Not the subscription. Your hours.
I'm not sure this is just about prompting
Part of me thinks the token gap is really about how people work, not how they prompt. If you plan before you code, if you break work into focused tasks, if you read code before asking the AI to change it, you naturally use fewer tokens.
But I also wonder if there's something else going on. Maybe certain types of projects burn more. Maybe frontend work is cheaper than backend work because the files are smaller. Maybe some people are using features that consume more (like large file reads or broad searches) without realizing it.
The subscription question
One thing I don't fully understand is why so many developers are using API plans and spending thousands of dollars in tokens when the Claude Max subscription gives you a lot of leeway for a fraction of the cost. I've used OpenClaw to estimate my own usage, and if those numbers are even somewhat accurate, the subscription is a significantly better deal for the way I work. I'm curious what tradeoffs people are seeing there.
Maybe there are valid reasons I'm not seeing. Maybe their use cases require API-level control. I genuinely don't know. But from where I'm sitting, it seems like a lot of money is being spent that doesn't need to be.
AI meets experience differently
Here's where I want to be careful, because I don't want to sound like I have this figured out. I don't.
I've only been using AI seriously for about a year. Before that it was more curiosity than workflow. I'm still trying to understand the real use cases, and I think everyone else is too.
One thing I've noticed is that experience plays a weird role here. Experienced developers bring patterns and instincts that help them direct AI more efficiently. But inexperienced developers sometimes have great ideas and let AI execute in ways that experienced people might overthink. Both can work. AI just meets everyone's background differently.
The thing I keep coming back to is that none of us really know what we don't know. I can say that about myself just as easily as anyone else. I'm building on 15 years of shipping software, and that gives me certain advantages with AI workflows. But someone with a completely different background might discover patterns I'd never think of. That's what makes this moment interesting.
What I'd actually like to know
I'm curious what other developers are experiencing. If you're hitting limits regularly, I'd genuinely like to understand:
- What does your typical session look like?
- How much context are you loading per prompt?
- Are you working on one focused task or jumping between things?
- Do you have project-level instructions (like CLAUDE.md) set up?
This isn't a "you're doing it wrong" post. I think there's a real conversation here about how AI-assisted development scales with experience, and I don't think anyone has a definitive answer yet.
The tools are new. The patterns are still forming. If token anxiety is the bottleneck keeping people from shipping, that's worth understanding.