Securing authenticated agentic AI

15 May 2025 🔖 security iam
💬 EN

We need better authZ, fast
One more cool link
Additional summary
Gartner link
- Nondeterminism not mentioned
- Article takeaways
  - OAuth hygiene
  - Identity architects
More miscellany

I found 3 excellent blog posts by Stytch about the intersection of traditional IT concerns and AI agents:

How to think through enhancing legacy workloads to be able to be used by customers’ and employees’ AI agents”:
- The age of agent experience
- Key point: the workloads might need to become OAuth providers even if they previously weren’t, so that they can specify appropriate authentication (“authN”) & authorization (“authZ”).
Intrusion detection patterns in the age of AI agents:
- Detecting AI agent use & abuse
Authorization (“authZ”) for AI agents:
- Handling AI agent permissions

Honestly, the whole blog looks interesting, at a glance.

Of course, every article ends saying you should buy their authentication service provider product.

But overall, they seem to have a pretty great team of writers thinking about big-picture architecture, observability, & IAM questions in the age of AI – both “feature”-wise and “defense”-wise.

I found it after Googling a term Netlify came up with recently: “agent experience,” or “AX,” which is meant to follow on the concepts of “user experience” (“UX”) and “developer experience” (“DX”).

We need better authZ, fast

One thing I didn’t see right away was the idea of how to refactor an API so that, for example, all ResourceA.read-scoped transactions would be fine for a chatbot to do on behalf of User #123 as soon as that human authorized their chatbot to work on their behalf in that way. But so that even once User #123 also gave their chatbot permission to assume their identity for ResourceB.write-scoped actions, the human would be required to manually sign off on each and every single attempted write transaction against Resource B that the chatbot wanted to attempt on their behalf.

Some research I did indicated that the API should probably be refactored to have a “pending transactions” queue, and that some sort of out-of-band human approval step (it couldn’t just be the human saying “I approve” in the chatbot, or the chatbot might learn to lie to the API on the human’s behalf) could be the trigger to release transactions for actually happening.
- Which made me think that if that’s how we’re gonna write APIs from now on, wow, we’re gonna need some standards for returning to where you left off, the way OAuth has a redirect URL syntax.
- And we’re gonna need granular, intuitive controls, the way phone operating systems always let you deny/revoke certain types of permissions from an app, letting you break certain functionality at your own risk if you want, but you’re still allowed to decide that your texting app can’t access your camera, if that’s meaningful to you, even though most people would probably leave it on at all times. And the way phone operating systems make it pretty obvious how to do it (e.g. “always ask” / “only this time” options).
  - Heck, we don’t even have enough of this as it is in conventional OAuth. I passionately hate the way GitHub doesn’t let organization owners cherry-pick which of a GitHub App’s desired scopes will actually be allowed. (It’s 2025, and it’s still like the bad old days of all-or-nothing phone app permission approval.)
- The research chatbot suggested “Open Human Approval Protocol,” which cracks me up because “O.H.A.P.” reminds me of the word “mishap,” which seems appropriate.
That said, a company called Descope seems to have come up with a much catchier name for the concept I’m mulling over: “progressive scoping.”

It’s a shame that the OAuth spec doesn’t have a way to make it so that you can have certain scope grants time out faster than others. e.g. leave ResourceA.read renewable w/o human intervention for days, but make ResourceB.write require human reapproval after 20 minutes.

Today’s authZ safety nets seem made for humans being the only source of nondeterministic action (that is, potentially exhibiting different behaviors on different rounds of performing an action).

Sure, we let computer scripts OAuth into our accounts and run at midnight while we sleep, but we’ve counted on them being programmed in a deterministic way (that is, given a particular input, they will always produce the same output).

Heck, humans are actually pretty deterministic. We presume outsiders are always trying to hack us. We presume insiders could try to hack us but generally aren’t going to get frustrated and keep trying harder if we don’t let them.

Anyway, authorization is going to have to be so different when machines suddenly can behave unpredictably!

Yikes. I don’t think a lot of us are ready to protect humans from surprises (nondeterminism) when they start trying to let chatbots help them do things!

One more cool link

Service Design for AI: Why Human Experience (HX) Matters gets at the broader concept behind why I’m horrified by authZ and what I meant when I said that humans are “pretty deterministic” within their day-job contexts.

“Disintermediation in AI-mediated systems is the process by which people become separated from direct relationships, capabilities, and decision-making as AI interfaces interpret and act on their behalf. For example, when AI chatbots become the primary way to access customer service, people don’t just lose direct contact - they lose the ability to express needs in their own terms, must learn to communicate in ways the AI understands, and lose access to human judgment in complex situations.”

People are going to start using agentic AI tools to help them achieve efficacy at tasks they already didn’t quite understand how to do. Part of our responsibility, as maintainers of systems that their tools are going to interact with on their behalf, is to lock down those systems’ authZ in ways that help improve the human experience – that don’t leave humans wondering why everything’s broken once they start using their tool against our service (which we might’ve designed while knowing about the existence of deterministic user-helper tools, e.g. web browser extensions, but deterministic machines are no longer all that our users might be bringing to help them use our apps/APIs/etc.).

Additional summary

I told a colleague on 10/15/2025:

I think the main danger with putting MCPs in front of deterministic APIs, without first redesigning the API layer…

…is that existing international standards for API authZ are old.

Therefore, they’re based on presuming that:

nondeterministic “intent” when using them is at human-scale rates/degrees of mind-changing/gullibility…

…not machine-scale rates/degrees of variance/gullibility.

Old API authZ is based on the assumption that it’s safe to grant “write” access to a nondeterministic entity (a human) for longer than a few seconds/minutes at a time because…

A generally competent+nonmalicious human will probably wake up generally competent+nonmalicious the next day.
And if they were generally competent+nonmalicious 5 minutes ago, they probably still are now, and probably still will be in another 5 minutes.

(in other words, the physical body’s aversion to poverty/jail makes humans, compared to nondeterministic machines, surprisingly determinstic and low-variance from one moment to the next).

Even if anything about competence/intent were to change in a human, there’s probably only so much gullible/incompetent/malicious work a human can get done (scale) before they need to sleep.

(and others catch up and notice, and they get caught and remediated)

Gartner link

Update 11/3/2025: Another colleague discussed Erik Wahlstrom’s 10/9/25 Gartner article #G00836769 (“Access Tokens: Lifetimes, Compensating Controls, Trust and Delegation“) with me.

It’s a brief goldmine of excellent, actionable, and enlightening details about how enterprises should think about OAuth 2.0 when designing and implementing their business solutions. If you’re lucky enough to be able to read it, do so.

Nondeterminism not mentioned

I was a little surprised that Gartner didn’t mention the fundamental “nondeterminism” (and sometimes “frequency”) scale-up that “AI agents” introduce into the behavior patterns of identity “principals,” compared to the way “principals” could be expected to behave before “AI agents” emerged. (Which is what most of the blog post above is about.) Gartner just writes:

“It’s reasonable to revisit every [legacy and upcoming solution] decision and use case in the AI era.”

I guess … maybe … the exact nuance of why AI agency is “threatening” from an IAM perspective isn’t relevant to how to solve the threat, so it wasn’t worth nerding out about in this article?

Gartner’s omission of explaining what about the threat model recently changed surprised me. But perhaps it shouldn’t.

Article takeaways

My TL;DR takeaways:

OAuth hygiene

Gartner believes that, as with a lot of things, hygiene / consistency / thoroughness generally beat panic.

Gartner makes an excellent point that there’s a decent chance that the incoming influx of “fixes” to this problem of scaled-up nondeterminism (e.g. “progressive scoping” bolt-ons) could themselves be buggy.

So while enterprises wait for the landscape of solutions to settle, Gartner believes that enterprises should spend their time battening down the hatches of what they’ve already got for identity, access management, OAuth 2.0, etc.

Identity architects

Gartner seems to believe that ideally, the eyes of a professional identity architect should land on every single greenfield & brownfield application/solution an enterprise has ever designed/implemented. (In addition to, say, getting a broader security architect’s, network architect’s, etc. eyes on it.) This is because:

It’s really hard for a non-expert.
While blanket policies often can solve the “it’s hard” problem by saying “just do this and you’ll be fine,” that won’t do for OAuth 2.0 success. “Battening down the hatches,” in the world of OAuth 2.0, involves a lot of “it depends” per-solution custom tailoring, so it’s important to have actual human experts involved in each one.
- (That said, if you don’t have identity architects, article #G00836769 gives excellent per-solution examples of the recommendations a trained identity architect would likely make.)
The internet is full of bad OAuth 2.0 advice, including, sadly, too many official product vendor docs. Which means that any application designers/implementers who don’t have the services of a trained identity architect handy are going to see Google’s AI search summary repeating lousy advice.
- (There are other domains of application design & implementation where the internet is full of great advice and therefore so are LLM chatbots! This ain’t one of them. Not yet, anyway, according to Gartner.)

More miscellany

5/14/2026: Added /u/Mooshux Reddit comment, which captures what frustrates me so much about the idea of “nondeterministic writes,” both attended and unattended (as well as nondeterministic reads, if your concern is data exfiltration):

“The autonomy framing is where I’d focus the concern. An agent that calls humans when it’s not sure is a fundamentally different risk profile from one that just acts.

“What makes autonomous agents genuinely dangerous isn’t that they’ll go rogue.

“It’s that a legitimate action taken by a compromised or misdirected agent looks exactly like a legitimate action. The security logs show valid credentials, valid API calls, valid session. By the time you notice something’s wrong, the damage is done.

“Scope the credentials first.

“Give the agent only what it needs for the specific task it’s authorized to perform right now, not a key that covers everything it might ever need.”

What frustrates me, though, is that I still don’t see enough just-in-time authZ grant mechanisms in OAuth2 and its leading/hyperscale-level implementations to do follow Mooshux’s excellent advice. 😡

(Found Mooshux’s comment when searching for reactions to AWS’s “DevOps Agent” as advertised on episode 675 of Screaming In The Cloud. Another great comment was /u/fredjutsu’s reaction, “lol – literally spending 100x more on compute than just running cron jobs with s**t that already works“)

5/14/2026: I asked a recently-trained LLM:

See this article that Katie Kodes wrote. Is the machine-ultra-short-lived-JIT-authZ-escalation landscape catching up yet among the hyperscale-level identity & access control providers, or is the world really just irresponsibly plowing ahead with nondeterministic tooling? 😡

Here was its response:

(click expander icon at left – it’s a rightward-facing triangle – if its opinion below is not showing up)

I don’t have the article link — could you share the URL to Katie Kodes’ post? Once I have it I can fetch it and weigh in on where the hyperscalers (Entra ID, AWS IAM/STS, GCP IAM) actually stand on machine-to-machine, ultra-short-lived, JIT-elevated authorization versus the “ship nondeterministic agents and pray” status quo.

A few things that would sharpen the answer if you can confirm:

Are you focused on workload/machine identity specifically (SPIFFE/SPIRE, workload identity federation, mTLS-bound tokens), or also human JIT elevation (PIM, AWS IAM Identity Center session policies, GCP PAM)?
Is “nondeterministic tooling” here pointing at LLM agents acting with broad standing privileges, or something else (e.g., Terraform drift, generic CI/CD service principals)?

While you grab the link, the short version of my prior: the primitives are largely there at the hyperscalers (workload identity federation, attribute-based conditions, sub-hour token TTLs, request-time policy evaluation, PIM/PAM-style time-boxed grants), but the default postures and tooling ergonomics still push teams toward long-lived, over-scoped credentials — and the agentic-AI wave is racing ahead of the controls that would make per-action, per-prompt authorization checks practical. So “catching up” is technically yes, operationally mostly no.

Send the link and I’ll ground this against what Katie actually argued.

(Note: I pasted in the article)

Short answer: the primitives are mostly there at the hyperscalers, but per-action, per-intent, sub-minute JIT authZ for machine principals — which is what Katie (and Mooshux) are actually asking for — is essentially not a shipping product yet. The world is plowing ahead.

Here’s the honest state of play as of mid-2026:

What hyperscalers have shipped (the “primitives”)

Short-lived workload tokens. AWS STS AssumeRole (default 1h, min 15min), GCP service account access tokens (1h), Entra ID workload identity federation + client assertions (1h). All bounded, all refreshable.
Federated workload identity (no long-lived secrets). AWS IAM Roles Anywhere / IRSA / EKS Pod Identity, GCP Workload Identity Federation, Entra Workload Identity Federation, SPIFFE/SPIRE in the OSS world. This part is genuinely good now.
Session/scope-down policies. AWS STS session policies, GCP downscoped credentials (CAB tokens — literally designed to mint a narrower token for a single upcoming call), Entra .default + incremental consent, OAuth 2.0 RAR (RFC 9396, “Rich Authorization Requests”) and Step-Up Authentication Challenge (RFC 9470).
Human JIT elevation. Entra PIM, AWS IAM Identity Center session policies + temporary elevated access, GCP Privileged Access Manager (GA’d 2024). These work — for humans.
Policy-at-request-time engines. Cedar (AWS Verified Permissions), OPA/Rego, OpenFGA, Google Zanzibar-likes. Fine-grained, can evaluate per call.

What’s still missing — and is exactly Katie’s complaint

No standardized “per-tool-call” token exchange in OAuth. RFC 8693 (token exchange) and RAR exist, but no hyperscaler ships a turnkey flow that says “mint me a 30-second token good for exactly ResourceB.write on row 47, then die.” You have to build it. Most don’t.
No native “different scopes, different TTLs” in any major IdP. OAuth still treats scope as a flat bag with one expiry. Descope’s “progressive scoping,” Stytch’s connected-apps stuff, WorkOS, Auth0 FGA — all bolt-ons. Entra/Okta/Cognito/Google IdP: nope.
No standard out-of-band human-approval callback. Exactly the “OHAP” Katie laughs about. CIBA (Client-Initiated Backchannel Authentication, OIDC) is the closest existing standard and it’s almost never deployed for this use case. MCP itself has no approval-callback spec — it punts to the host app.
JIT for machine principals barely exists as a product. PIM/PAM are explicitly human-facing. For a service principal / managed identity, the assumption is still “grant the role, leave it on, rotate the secret.” The Mooshux quote (“scope the credentials first … for the specific task … right now”) describes a thing you can build on AWS/GCP/Entra but cannot buy.
Agent identity is unsolved. Is the agent the user? A delegated principal? Its own workload? OAuth 2.0 doesn’t have a clean “acting-on-behalf-of with attenuated, revocable, time-boxed delegation” story. Google’s draft “agent-to-agent” / A2A stuff and the various MCP auth drafts are early and contested.
MCP servers are mostly fronting APIs with a single long-lived OAuth token per user. This is the exact anti-pattern Katie names. The MCP authorization spec (March 2025 revision) finally mandated PKCE and resource-indicator scoping, but it doesn’t require per-tool-call token minting, and almost no MCP server in the wild does it.

So: catching up, or plowing ahead irresponsibly?

Both, but the ratio is bad. The standards bodies (IETF OAuth WG, OpenID Foundation) are actively working on this — Transaction Tokens (draft-ietf-oauth-transaction-tokens), updates to token exchange, GNAP as a longer-term replacement for OAuth 2.0 — but none of it is shipping in Entra/AWS/GCP as a default-on, ergonomic feature. Meanwhile every vendor is shipping agent frameworks that happily pass a user.read user.write Mail.ReadWrite Files.ReadWrite.All token into an LLM’s tool loop.

Katie’s frustration is correct and well-calibrated. The right shorthand is: we have all the LEGO bricks for ultra-short-lived, JIT, attenuated machine authZ, and almost no one ships the assembled model. The hyperscalers are letting the agent platforms outrun the IAM platforms, and the agent platforms are betting “good enough scoping + good enough logging” will hold until the IAM side catches up. Gartner’s “batten the hatches on what you have” advice is basically an admission of that gap.

The genuinely useful things she could ask vendors for today, by name:

AWS: AssumeRole with a tight inline session policy per agent tool call, 15-min TTL, plus CloudTrail + IAM Access Analyzer policy-narrowing.
GCP: Downscoped Credentials (CAB) — literally the closest existing thing to per-call JIT.
Entra ID: Continuous Access Evaluation + conditional access auth context + step-up via RFC 9470 for write scopes; PIM for any human approver in the loop.
Cross-cutting: OAuth Step-Up Authentication Challenge (RFC 9470) and RAR (RFC 9396) — both are the standards-track answer to “make ResourceB.write require fresh approval.” Both are barely deployed.

So: not crazy to be 😡. The standards exist or are close; the shipped products lag; the agent ecosystem doesn’t wait.

Tweet Share on LinkedIn

Previous Post Next Post

--- ---

Salesforce, Python, SQL, & other ways to put your data where you need it