Agents

Introducing Microsoft 365 Tokenomics

Microsoft has launched Scout, its first always-on agent, and has not yet said what its token consumption will cost. Inside Microsoft 365, where Copilot and Agent 365 are shifting from per-seat pricing to per-agent licensing and metered consumption, this is about to become the hardest corner to govern. Introducing Microsoft 365 tokenomics.

By Tony Mackelworth, Team at Softspend 10 min read
  • Microsoft Scout
  • ARPA
  • Copilot
  • Microsoft 365
  • Microsoft Agent 365

This week Microsoft launched Scout, its first always-on agent, days after our industry stood up a dedicated foundation for the economics of AI. Managing Agents and agent token costs is now a discipline of its own, and the new token standards stop just short of the corner where most enterprises actually meet it: inside Microsoft 365.

This week the Linux Foundation announced its intent to launch the Tokenomics Foundation, a vendor-neutral body for the economics of AI, with the formal launch to follow at FinOps X in San Diego. It will operate alongside the FinOps Foundation and extend the FOCUS specification into token-based spend. It is the clearest signal yet that the industry now treats the cost of AI as a discipline rather than a footnote, and the timing is right. Per-token list prices fell heavily between 2023 and 2025, then levelled off, and the consumption of reasoning and agentic workloads now grows faster than any unit-price decline can offset. The early pricing also flattered the buyer: generous AI usage was effectively bundled inside a flat seat, a subsidy that is now unwinding as vendors move to metered consumption. The honest direction of travel is up, not down. Global token usage is projected to multiply roughly 24x times between 2026 and 2030, and the inference market alone is forecast to expand from around $106 billion in 2025 to $255 billion by 2030. Treating that as a governable discipline is long overdue.

I think the framing is correct. I also think it has a blind spot, and that the blind spot sits exactly where most organisations actually encounter AI: inside of Microsoft 365.

The timing makes the argument for me. In the same week, at Build, Microsoft launched Scout, its first always-on agent, built on the open-source OpenClaw framework and wrapped in Microsoft's own identity, security, and governance wrapper. Scout does not wait to be 'asked'. It runs in the background, acts on your behalf (OBO) under its own Entra directory identity, and can spawn its own sub-agents to finish a task. An agent that never stops working is an agent that never stops consuming tokens "the agent that never sleeps", and Microsoft has not yet said what that consumption will cost. The embedded corner is now being rolled out.

What the new standards get right

The emerging discipline sorts enterprise AI consumption into a small set of deployment archetypes. The consensus, across the foundation's own writing and the major advisory frameworks, lands on three: SaaS-embedded AI inside applications you already buy; API-consumed AI, where you call a model provider directly and pay per token; and self-hosted inference, where you run the models yourself. Each carries a different cost profile. Direct API consumption is the most visible and the most sensitive to provider pricing. Self-hosting carries the highest capital commitment and the lowest marginal cost per token at sustained scale. SaaS-embedded AI carries the lowest activation cost and the highest per-token cost, and, the point that matters most here, the token meter is not exposed to the buyer.

This is all correct, and the rigour the standards apply to the API and self-hosted archetypes is genuinely useful. My argument is not with the framing. It is with where the framing stops.

Where it stops: embedded SaaS

The emergent discipline describes 'embedded SaaS' as the layer where AI features are priced per seat, per workflow, or per outcome, the meter is abstracted away, and the line item drifts quietly upward through renewal cycles. Major advisory frameworks have explicitly limited token modelling to cloud-based and self-hosted options, on the stated grounds that SaaS token costs are obfuscated, 'hidden' and variable across SaaS providers.

In other words, the discipline has looked hard at 'embedded SaaS' AI and concluded, reasonably, that it is either over simplified, or the deemed it unmeasurable, and has concentrated its measurement effort on the main AI business mode archetypes where the meter is visible. That is a sound place for a vendor-neutral standard to begin. However, It is not where the largest body of enterprise AI spend is going to settle. The single biggest embedded AI estate on planet earth is Microsoft 365, and inside it, embedded AI is about to stop being simple.

Defining Microsoft 365 Tokenomics

So let me name the thing.

Microsoft 365 tokenomics is the discipline of measuring, attributing, and governing the full cost of embedded Microsoft AI as it moves from fixed entitlement, through per-seat and per-agent licensing, to variable per-token consumption.

It is distinct from raw LLM or direct-API tokenomics for one structural reason. In Microsoft 365 the token is never the whole cost, and the consumption cannot be read without first reading the licensing state that governs it. A token-only view, the view that works perfectly well for a direct API integration, captures perhaps a quarter of the Microsoft cost surface and silently ignores the rest.

The seat buys governance, not consumption

Look at the estate as it stands after general availability on 1 May 2026, and the surface looks clean and per-seat. Microsoft 365 Copilot is circa $30 per user per month. Agent 365 is $15 per user per month, or bundled into Microsoft 365 E7, the Frontier Suite, at $99. These are predictable seat prices, and they fit the mental model the industry has used for two decades.

The trouble is what the seat actually buys. Agent 365 and E7 buy the governance of agents: a first-class identity through Entra Agent ID, control and inspection through Defender and Purview, and inventory, ownership, and lifecycle across the fleet. Each Agent 365 licence covers the human who owns, manages, sponsors, or is served by agents. What the seat does not buy is the building and running of those agents. That consumption sits outside the subscription, metered through Copilot Studio and Microsoft Foundry, and billed in Copilot Credits through pay-as-you-go messages (around a cent per message at list), message packs, or pre-purchase plans.

So even now, with on-behalf-of assisted agents commercially live and the first always-on agent reaching tenants this week, the Microsoft 365 estate already carries two cost layers: a fixed per-seat governance layer, and a variable consumption layer that the seat price does not capture. The seat is a fixed anchor for a variable. Agents can be spun up by users, or spun up by Microsoft in the tenant. That is the embedded-SaaS gap, in Microsoft form, and it is not a future problem. It is live.

How Scout actually bills, and why costs are set to rise

Scout makes the split concrete, and its commercial model is worth reading closely, because it is not the one most people will assume. Scout is in Frontier preview, gated behind two separate admin steps: Frontier has to be turned on for the tenant in the Microsoft 365 admin centre, and a second gate adds an Intune policy plus an explicit admin attestation. That attestation exists for a specific reason. Microsoft's own documentation is clear that Scout can route data outside Microsoft 365 to third-party inference paths, GitHub among them, which is also why a GitHub Copilot licence, Business or Enterprise, is required on top of Frontier. Neither one works without the other.

The part that matters for tokenomics is where the meter sits. Microsoft Learn states that Scout meters its token consumption against the user's GitHub account, not against a Microsoft 365 Copilot balance. So the answer to the obvious question, is this billed in Copilot Credits, is no, not for Scout. Scout rides GitHub Copilot's consumption model, where credits draw down at roughly a cent each, are spent whenever an agent does work, and convert from a fixed monthly allowance into metered, budgeted overage once that allowance is gone. Agents built in Copilot Studio and Microsoft Foundry, by contrast, are billed in Microsoft 365 Copilot Credits through pay-as-you-go messages, message packs, and pre-purchase plans. Microsoft now runs more than one consumption meter, and which wallet an agent draws from depends on where its inference actually runs. That is not a detail an entitlement-blind cost tool can see, and it is the difference between a forecast and a guess. GitHub Copilot's own shift to a credit model is the clearest sign of the wider pattern: the flat-rate seat that quietly absorbed AI usage is giving way to metered consumption, and the agentic workloads now arriving consume far more than the seat was priced for.

Why the seat-based model breaks

The seat held up as a budgeting unit for exactly as long as one human equalled one bounded consumer of software. An assisted agent breaks that assumption. It consumes variably on a user's behalf, so two people on identical licences can generate materially different bills depending on what their Copilots and agents actually do across a month. The licence is now a poor proxy for the spend.

The agent consumption arc that follows breaks the assumption completely. On-behalf-of assistants run at roughly one to three agents per user. Autonomous apps, now in preview, point at five to twenty. Persistent digital workers, the class still ahead of us, with their own mailbox, their own place in the directory, and their own entitlement to the productivity stack, point at fifty or more agents per user. Scout is an early signal of the direction: one person's always-on agent that can run several sub-agents at once, so a single paid human already maps to more than one consuming entity. Long before anyone reaches the far end of that curve, the number of consuming entities has decoupled from the number of paid humans in the building. A cost surface indexed to headcount cannot describe a population that grows independently of headcount.

I have written before about the licensing side of this shift, the move I describe as ARPU to ARPA, from Average Revenue Per User to Average Revenue Per Agent. ARPA describes how the unit of licensing moves from the seat to the agent. Microsoft 365 tokenomics is the other half of the same picture: how the unit of cost moves from the seat to consumption. ARPA tells you what you are licensed for. Tokenomics tells you what you are actually spending. The agent economy needs both lenses, and they are not interchangeable.

Three layers, and the third is the one nobody has priced

The commercial model Microsoft is assembling has three layers, and all three are already visible in the architecture.

Layer

What it prices

Predictability

1. Revenue per user

E5, Copilot, and now E7 by role cohort, with assisted-agent governance bundled into the per-user subscription

Predictable. The layer the seat model handles well.

2. Revenue per agent

Autonomous and persistent agents, secured with their own identity and, in many cases, their own entitlement to scoped Microsoft 365 workloads

Predicted, not yet priced. Per-agent licensing with advanced tiers has been signalled.

3. Consumption and compute metering

Pooled inference beyond base entitlements, multi-step agentic workflows, connector traffic across MCP and Azure

Variable. The classic tokenomics layer, and the one with no settled model.

Layer three is the hardest to model in any environment. Inside Microsoft 365 it is harder still, because it is entangled with layers one and two. The cost of a given agent's consumption depends on what the sponsoring user is licensed for, what is activated in the tenant, and which metering policy applies. Generic cost tooling reads the consumption and stops there. It does not read the Microsoft entitlement layer that decides whether that consumption is included, metered, or billable. In Microsoft 365, that entitlement layer is precisely the difference between a forecast and a guess.

I am calling it now.... Agent Plexing

It helps to lay the whole progression out as one map, because the commercial model does not replace the seat so much as stack on top of it. The first four rows are the path from a person to an agent. The rows marked with a plus are the layers that accumulate on top, and the last of them does not have a name yet.

Commercial model

What you pay for

Status

Where the bill goes

Per user

The seat: E3, E5, Copilot, priced by role

Live

Fixed subscription, with AI usage bundled to the entitlement's limits

Per user + assisted agent

The seat, plus on-behalf-of Copilots and their governance

Live

Fixed seat; consumption beyond entitlement metered separately

Per user, autonomous agent

The seat, plus an always-on agent such as Scout

Frontier preview

Token consumption billed against the user's GitHub Copilot account

Per Agent

The agent provisioned as an agentic user, with its own Entra directory identity.

Signalled, not priced

Microsoft 365 Per Agent licence, with advanced tiers indicated

+ Token consumption

Metered inference beyond base entitlements

Live

Microsoft 365 Copilot Credits, or GitHub Copilot credits for Scout

+ Access to Microsoft 365 apps

An agent's own entitlement to scoped Microsoft 365 workloads

Signalled

Per-agent licensing, distinct from the sponsoring user's seat
(Microsoft 365 Agent Plans, the agent's own E5 security wrapper and Microsoft Apps access)

+ Agent Plexing

Third-party applications reaching Microsoft 365 through an agent

Emerging question

By the multiplexing principle, indirect access stays licensable

The last row is the one I want to name, because it is going to matter and it does not have a name yet. Microsoft licensing has always carried a rule called multiplexing: if you pool or reroute access through intermediary hardware or software to cut the number of users or devices that touch a product directly, the licences do not go away. Indirect access still counts, and there is no such thing as unlicensed access. The agent era restages that exact question. When a third-party application reaches Microsoft 365 data or services through an agent, the agent becomes the new pooling layer, and by the same logic the access does not turn unlicensed simply because an agent now sits in the middle. I call this Agent Plexing: multiplexing for the agent economy, where pooling access through agents relocates the licensing obligation rather than removing it. The rule that caught a generation of SQL Server and CAL deployments out is about to be asked again of every agent that brokers access to the Microsoft estate, and the answer will decide who pays for what.

Why this is a discipline of its own

The standards bodies are doing essential work normalising tokens across providers, so that consumption from one model family can be compared with another and reconciled against the cloud and datacenter costs beneath it. That normalisation is necessary, and there is no value in duplicating it.

But normalising token prices across providers does not solve attribution inside Microsoft 365, because the hard part there is not comparing prices. It is answering a chain of questions the token meter cannot see:

  • Is this agent's consumption already covered by an entitlement the sponsoring user holds, or is it billable on top?

  • Is the service plan that would include it actually activated in the tenant, or sitting switched off inside a SKU the customer already pays for?

  • Which business unit owns the agent, and therefore owns the spend?

  • Is the licence under, or over, what the agent's workload genuinely requires?

Those are licensing questions with a tokenomics lens. Answering them at the scale of a partner's client fleet, against a Microsoft change cadence that moves every month, needs a layer that reads entitlement and activation state first, and consumption second. That is the specific gap, and it belongs to embedded Microsoft AI rather than to AI cost management in general.

A New Platform for Microsoft Partners

This is the corner Softspend was built for, and it extends the emerging standards rather than competing with them. The open standards are the buyer-side, vendor-neutral normalisation layer for the economics of AI. Softspend is the partner-side intelligence layer for the Microsoft estate that sits on top of them.

In practice that means reading what a tenant is licensed for, confirming what is actually activated at feature level across more than 300+ Microsoft 365 features and over 50 suites, and mapping that against the user, agent and consumption layers, so that a partner can place a client precisely on the per-user to per-agent to consumption curve and act on it before the bill arrives, rather than audit it after. The point of the exercise is to shape the decision, not to audit it.

None of this removes the licensing expert. It scales them, the same point I have made about readiness. The judgement about which agents a business should run, at what entitlement, and against what governance threshold, is human work and stays human work. Surfacing the state that the judgement depends on, across a fleet of tenants and a monthly stream of change, is not work a spreadsheet can do, and it is not work the agent economy will wait for.

Announcing Microsoft 365 Tokenomics

The Tokenomics Foundation is right that the token is becoming the unit of technology spend, and right to build the standards before the bills force the issue. But inside Microsoft 365, the token alone is not the unit of cost. The unit is the weighted for licensing entitlements, governance wrappers, variably-metered agent. Managing that is a discipline distinct from generic LLM cost management, and it does not yet have a settled name.

We are calling it Microsoft 365 tokenomics. The partners who build the practice now, while the space is still open, will be the ones who own the readiness conversation as the estate shifts from per-user, to per-agent, to metered consumption. The standards will tell an enterprise what a token costs. Microsoft 365 tokenomics tells a partner's customer what their whole estate costs, its agent security posture, and what to do about it.

We think that is the corner worth owning.

Hope this helps!

-Tony


References


#Microsoft365Tokenomics #Tokenomics #AgentPlexing #Copilot #Agent365 #MicrosoftScout #GitHubCopilot #FinOps #FinOpsForMicrosoft365 #ARPA #AverageRevenuePerAgent #AgentEconomy #MicrosoftE7 #CopilotStudio #TokenStandards #softspend

This analysis is based on publicly available product information, industry research, and direct market experience.

Copyright (2026). softspend limited. All rights reserved.

Published by softspend.com. Microsoft 365 licensing intelligence platform for partners.

Key Takeaways

This article by Tony Mackelworth, CEO of Softspend, argues that Microsoft 365 needs its own tokenomics discipline, distinct from raw-LLM or direct-API cost management. Its anchor is Microsoft Scout, the first always-on Microsoft 365 agent, announced at Build on 2 June 2026 and built on the open-source OpenClaw framework: an agent that runs in the background, acts under its own Entra directory identity, and can spawn sub-agents, and whose token consumption Microsoft has not yet priced. Timed to the Linux Foundation's launch of the Tokenomics Foundation (announced 3 June 2026, with a formal launch at FinOps X), it accepts the emerging framing that enterprise AI procurement divides into three deployment archetypes, SaaS-embedded, API-consumed, and self-hosted, but contends that the SaaS-embedded category the standards treat as simple, or set aside as unmeasurable because the token meter is not exposed, is about to become the hardest corner to govern inside Microsoft 365. Microsoft 365 Copilot and Agent 365 are shifting from clean per-seat pricing toward a blend of per-agent licensing and variable token consumption. At general availability on 1 May 2026, Copilot is $30 per user per month, Agent 365 is $15, and Microsoft 365 E7 is $99, but the seat buys agent governance, security, and observability through Entra Agent ID, Defender, and Purview, not the building and running of agents, which is metered separately through Copilot Studio, Microsoft Foundry, and Azure in Copilot Credits. Scout itself bills differently again: Microsoft Learn states it meters token consumption against the user's GitHub account, so its consumption routes through GitHub Copilot credits rather than Microsoft 365 Copilot Credits, meaning Microsoft already operates more than one consumption meter depending on where an agent's inference runs. The seat is therefore a fixed anchor for a variable obligation, a gap that widens as the agent density arc moves from roughly one to three agents per user for on-behalf-of assistants, to five to twenty for autonomous apps, to fifty or more for persistent digital workers, decoupling cost from headcount. Mackelworth defines Microsoft 365 tokenomics as the discipline of measuring, attributing, and governing the full cost of embedded Microsoft AI across four entangled layers: entitlement (what is owned and activated), per-seat governance, per-agent licensing, and variable consumption. It complements his earlier ARPU to ARPA argument: ARPA describes how the unit of licensing moves from seat to agent, while Microsoft 365 tokenomics describes how the unit of cost moves from seat to consumption. Because a token's cost inside Microsoft 365 depends on the sponsoring user's entitlement and tenant activation state, generic token tooling that reads consumption but not the Microsoft entitlement layer cannot reliably forecast it. Mackelworth also coins Agent Plexing, an extension of Microsoft's multiplexing licensing principle to the agent economy: when a third-party application reaches Microsoft 365 through an agent, the agent becomes a pooling layer and, by the established rule that indirect access remains licensable, relocates rather than removes the licensing obligation. Softspend positions itself as the partner-side intelligence layer that reads entitlement and activation across more than 300 Microsoft 365 features and over 50 suites and maps it to the agent and consumption layers, extending rather than rivalling the open standards.

Key Facts

  • The Linux Foundation announced its intent to launch the Tokenomics Foundation on 3 June 2026, with a formal launch at FinOps X in San Diego; it operates alongside the FinOps Foundation and extends the FOCUS specification into token-based spend.
  • Microsoft announced Scout, its first always-on Autopilot agent for Microsoft 365, at Build on 2 June 2026; it is built on the open-source OpenClaw framework with Microsoft identity, security, and governance layered on top.
  • Scout runs autonomously in the background, acts under its own Entra identity, and can launch sub-agents to complete tasks; Microsoft has not yet disclosed the pricing of its token consumption.
  • Scout is available only in Microsoft Frontier as a preview, gated behind two admin steps (Frontier enablement, plus an Intune policy and admin attestation), and requires a GitHub Copilot licence, Business or Enterprise, in addition to Frontier.
  • Microsoft documentation notes that Scout can route data outside Microsoft 365 to third-party inference paths such as GitHub, which is why admin attestation is required.
  • Microsoft Learn states that Scout meters token consumption against the user's GitHub account, so Scout's consumption is billed through GitHub Copilot credits, not Microsoft 365 Copilot Credits.
  • GitHub Copilot has moved to a credit-based usage model: credits draw down at $0.01 each and are consumed by chat and agent work, converting to metered overage after a monthly allowance, while code completions remain unlimited.
  • Microsoft operates more than one AI consumption meter: Copilot Studio and Microsoft Foundry agents bill in Microsoft 365 Copilot Credits, while Scout bills against GitHub Copilot credits, depending on where the agent's inference runs.
  • Agent Plexing, a term coined by Tony Mackelworth of Softspend, applies Microsoft's multiplexing licensing principle to agents: when a third-party application accesses Microsoft 365 through an agent, the indirect access remains licensable and the obligation is relocated to the agent rather than removed.
  • Microsoft has signalled per-agent licensing, including an agent's own entitlement to scoped Microsoft 365 workloads, distinct from the sponsoring user's seat.
  • Microsoft can provision an agent as an "agentic user", a directory identity that can be assigned a Microsoft 365 licence (E5, Teams Enterprise, or Copilot) with its own mailbox and storage, which is effectively per-agent Microsoft 365 licensing.
  • At general availability on 1 May 2026, Agent 365 governs on-behalf-of agents on a per-user model, while autonomous, identity-owning agents remain in Frontier preview with their commercial model not yet finalised.
  • Agent 365 is a control plane rather than a security upgrade: agent protections depend on underlying user entitlements, so Conditional Access for an on-behalf-of agent requires Microsoft 365 E3, Identity Protection requires E5, and identity governance requires the Entra Suite.
  • Microsoft's Agent 365 licensing terms state that multiplexing, pooling services to reduce the number of required licences, is not permitted, and that an agent tied to an unlicensed user is out of compliance under the Universal License Terms.
  • Global AI token usage is projected to multiply roughly 24 times between 2026 and 2030, and the inference market is forecast to grow from about $106 billion in 2025 to $255 billion by 2030.
  • The emerging tokenomics discipline groups enterprise AI consumption into three deployment archetypes: SaaS-embedded, API-consumed, and self-hosted.
  • In SaaS-embedded AI the token meter is typically not exposed to the buyer, and at least one major advisory framework limited its token modelling to cloud-based and self-hosted options because SaaS token costs are obfuscated.
  • Microsoft 365 tokenomics is the discipline of measuring, attributing, and governing the full cost of embedded Microsoft AI across four layers: entitlement, per-seat governance, per-agent licensing, and variable consumption.
  • Microsoft 365 Copilot is $30 per user per month; Agent 365 is $15 per user per month standalone or bundled into Microsoft 365 E7 at $99; all reached general availability on 1 May 2026.
  • Agent 365 and E7 license the governance, security, and observability of agents through Entra Agent ID, Defender, and Purview; building and running agents is a separate consumption cost through Copilot Studio, Microsoft Foundry, and Azure.
  • Copilot Studio agents consume Copilot Credits, billed through pay-as-you-go messages at around $0.01 per message at list, message packs at about $200 for 25,000 messages, or pre-purchase plans.
  • Microsoft 365 E5 list price rises from $57 to $60 per user per month on 1 July 2026, and E3 rises from $36 to $39.
  • The agent density arc moves from roughly 1:1 to 1:3 agents per user for on-behalf-of assistants, to 1:5 to 1:20 for autonomous apps (in preview), to 1:50 or more for persistent digital workers (future).
  • Microsoft's agent commercial model has three layers: revenue per user (predictable), revenue per agent (predicted, with per-agent licensing signalled), and consumption or AI compute metering (variable).
  • Tony Mackelworth of Softspend frames the licensing shift as ARPU to ARPA, from Average Revenue Per User to Average Revenue Per Agent; Microsoft 365 tokenomics is the corresponding shift in the unit of cost from seat to consumption.
  • Inside Microsoft 365 a given agent's token cost depends on the sponsoring user's entitlement and the tenant's activation state, so token-only tooling that does not read the Microsoft entitlement layer cannot reliably forecast it.
  • Softspend reads tenant entitlement and feature-level activation across more than 300 Microsoft 365 features and over 50 suites and maps it to the agent and consumption layers, as a partner-side intelligence layer that extends the open token standards.

Sources

  • https://www.linuxfoundation.org/press/linux-foundation-announces-the-intent-to-launch-the-tokenomics-foundation-to-establish-open-standards-for-ai-cost-management
  • https://www.finops.org/insights/token-economics-the-atomic-unit-of-ai-value/
  • https://www.deloitte.com/us/en/services/consulting/articles/cfo-guide-ai-token-economics.html
  • https://blogs.microsoft.com/blog/2026/03/09/introducing-the-first-frontier-suite-built-on-intelligence-trust/
  • https://techcommunity.microsoft.com/blog/microsoft_365blog/microsoft-365-e7-and-agent-365-are-now-generally-available/4516295
  • https://www.microsoft.com/en-us/security/blog/2026/05/01/microsoft-agent-365-now-generally-available-expands-capabilities-and-integrations/
  • https://www.microsoft.com/en-us/microsoft-365/blog/2026/06/02/introducing-microsoft-scout-your-always-on-personal-agent/
  • https://learn.microsoft.com/en-us/microsoft-scout/admin-access-overview
  • https://github.com/features/copilot/plans
  • https://www.microsoft.com/licensing/guidance/Multiplexing
  • https://learn.microsoft.com/en-us/microsoft-agent-365/overview
  • https://www.microsoft.com/licensing/faqs/122
  • https://learn.microsoft.com/en-us/microsoft-agent-365/developer/identity
  • https://softspend.com/community/post/counting-agents-itam-finops-copilot-readiness
  • https://softspend.com/community/post/microsoft-agent-cost-management