GitHub Copilot is moving to usage-based billing — announced this week on the GitHub blog. The flat $10 / $19 / $39 per-seat pricing isn’t going away entirely, but each tier now comes with a “premium request” budget, and once your budget runs out you either pay overage or wait for the monthly reset. The fine print matters: not every Copilot interaction counts as a “premium request,” and the exchange rate from a chat turn to a billable unit is opaque enough that you should test it carefully before signing up your team.
Here’s what changes in practice and how to think about whether your team’s bill goes up, down, or sideways.
What’s actually a “premium request”
Per the announcement: classic inline completions (the gray ghost-text suggestions you’ve been using since 2021) stay flat-rate. The premium tier counts:
- Chat conversations against the larger frontier models — GPT-5, Claude 4.5 Opus, Gemini 3 — anything more expensive than the default model.
- Agent-mode tasks where Copilot writes a multi-file change set autonomously.
- Code review on PRs above a certain diff size.
- Long-context queries (anything >32k tokens of context) regardless of model.
So the “premium request” mechanism is essentially a token-budget proxy without exposing the actual token count. From an accounting perspective that’s annoying — you can’t predict cost from observable inputs. From a product perspective, GitHub is doing what every AI vendor is doing right now: passing through inference cost variance to customers, because their own model bills are now more variable than they were under fixed-rate provider contracts.
Who pays more, who pays less
From the public budget breakdowns:
- Hobbyist / occasional user on the $10 plan, mostly using inline completion: bill stays the same. No behavior change.
- Heavy chat user who always picks the latest Claude or GPT model: bill goes up. Expect to blow through the included premium budget before the second week of the month, then either accept overages or downgrade your default to the included flat-rate model.
- Agent-mode power user running multi-file refactors a few times a week: bill goes up significantly. A single agent task that writes 10 files is genuinely an order of magnitude more compute than 10 manual completions.
- Enterprise admin with hundreds of seats: variance becomes the main story. Some seats will be near-zero on premium requests, others will exhaust budget. Total bill is hard to predict.
What to do this week
- Check the per-user dashboard for current premium-request usage. GitHub turned this on a few months ago in preview; now it’s the billing source of truth. If you have heavy users hitting limits, you have data.
- Set a default model. The IDE picker remembering your last choice can quietly drift the team toward whichever frontier model the most expensive person used last. Setting a default at the team level pins this.
- Audit agent-mode usage. Most teams do not need agent-mode running daily; it’s the highest-cost interaction. If a developer is running it on every PR, that’s worth looking at.
- Compare against alternatives — Cursor’s per-month flat rate for similar usage, Codeium for inline-completion-only teams, self-hosted on Continue.dev / OpenWebUI plus a model API you control. The math is now more interesting than “Copilot ships with the org’s GitHub plan, just keep using it.”
The deeper trend
This is the third major AI-tool pricing change this quarter — Anthropic’s tiered Claude Code pricing, Cursor’s “auto” mode swap, now Copilot. The pattern is the same in all three: vendors are realizing that the heaviest 10% of users consume 60–80% of the inference budget, and flat per-seat pricing gives those heavy users an unbounded subsidy at the expense of light users.
Usage-based billing fixes that, but introduces a new failure mode: developers being cost-conscious about asking the AI for help. Some of the productivity wins of the last two years came from the friction-free aspect of “just ask Copilot.” Add a meter, and people start hesitating before they ask. That hesitation is invisible in dashboards but real in throughput.
If you manage an engineering org, the budget conversation now has to include “do not let people self-ration.” Either give them a meaningfully high cap (so they don’t think about it day-to-day) or move them to a tool with predictable pricing. The middle ground — “we have a cap, please be careful” — is the worst of both.
Source: github.blog
