E2E and Synthetic Testing Considered Harmful
01 Aug 2025
“End-to-end test automation and synthetic monitoring considered harmful” – was I clickbaitey enough to get your attention? 😉
Embracing automation
Lately, I’ve been helping quality assurance professionals (“QAs”), developers, and observability engineers improve web application responsibility by automating:
- Automated web-browser-based end-to-end testing of web applications as part of their CI/CD-based build/deploy automations.
- (Example tools: Selenium, Cypress, Playwright, Puppeteer, Microsoft Power Apps Test Engine, Microsoft Power Apps Test Studio, BrowserStack Automate, BrowserStack Low-Code Automation, etc.)
- Synthetic montoring, run regularly, once an application is up and running, to make sure it’s still up and running.
- (Example tools: DataDog Synthetic Monitoring Browser Tests, Splunk Observability Cloud Synthetic Browser Tests, any of the above tools but scheduled instead of run by CI/CD, etc.)
Risks
Below are some identity, authorization (“authN”), and authentication (“authZ”) threat models that have been haunting me lately on account of this work:
Unattended authN
These automations are inherently left unattended, once you’ve got them set up and being useful.
If these tools’ job is to automatically test a web site that requires authentication (“authN” / login), then these unattended tools will to be empowered to authenticate into it 24 hours a day, 7 days a week.
How many staff have the ability to configure the testing tool?
One of them might eventually get phished and hacked. It happens.
Now a hacker can rewrite the automated tests to do something malicious, using the testing tool’s 24/7 ability to log into the web site.
Logged authN
If one of these automations fails to pass the expectations it’s been programmed to validate, you’ll want to figure out why it failed. Logs help.
It’s tempting to log everything that happened within the web browser that the automation was using while it ran the test.
But guess what that would include?
Browser cookies.
I discovered this the hard way, myself.
While consulting for some QAs, I logged myself into an employee+vendor self-service portal, then recorded an instance of Playwright working its way through demonstrating that yes, my name showed up where it would be expected to show up.
I was thrilled to show these QAs how rich and useful Playwright’s browser trace logs are.
As I prepared to email them a link to the logs from my Playwright session, it suddenly occurred to me that Playwright’s trace logs included everything from Chrome’s developer tools – including a recording of my network activity.
Had a hacker gotten ahold of my recording (not impossible; I’d published it to the open internet to faciliate my demo, figuring it was fine if some random bot found it and learned that … my name exists), or had one of the QAs been clever enough, they could have exfiltrated a copy of the browser cookies that proved to the self-service portal’s server that I’m me.
Worse yet, because I had forgotten to explicitly program Playwright to log me out at the end of the recording, these cookies were still valid.
Anyone with my Playwright trace logs could have used its recorded cookies to log their own browser into the self-service portal as me and viewed or even changed my direct deposit bank account.
Would I eventually notice bank fraud and have a chance to fix it? Probably.
But what if instead of hacking into an employee and vendor self-service portal, the ability to impersonate the account I used when recording a Playwright “trace” had allowed them to do something equally dangerous but more subtle? Perhaps just as part of one of many steps in a chain of lateral movements in a serious hack. Read more on that below under “implicit authZ.”
Logged authZ-protected data
My Playwright mistake before that one was even funnier.
The QAs in question had initially been concerned about a bug on the direct deposit self-service page, so that’s where I’d started with making a Playwright demo.
To show the debugging power of Playwright, I turned on videos, screenshots, and trace logs while recording.
Yes, I almost emailed a whole team of near-strangers a video recording displaying my bank account number.
Mitigation ideas
All of these risks have me thinking hard about defense in depth.
I’m becoming convinced that it’s essential to design identity, authN, and authZ for automated testing on a presumption that it’s a question of when, not if, you’ll have an authN leak.
I believe that you should separate identities and minimize authZ privilege grants so intensely that a dedicated “test account’s” credentials, and the URL of the web site they’re meant to test, could be published on the front page of the New York Times.
And then, nothing bad should happen to your company, other than suffering a spike in network traffic.
- (Speaking of which, have your team, and your enterprise at large, hardened everything so that the traffic spike, and any system crashes it may directly or indirectly cause, wouldn’t expose some secondary vulnerability, similar to a buffer overflow attack?)
Privilege separation
- For each “test account” that you consider creating, consider whether it might make sense to create different versions of it that’ll be used with different testing tools. For example, you might want to make separate identities for your “synthetic monitoring” automations to use vs. your “CI/CD regression testing” automations to use.
- The synthetic monitoring tool will probably be spot-checking, not doing thorough tests, so giving it its own identity allows you to follow the principle of least privilege and grant the monitoring identity fewer privileges/authZ than you’d grant the regression-testing identity. Similarly, maybe automated CI/CD-time testing doesn’t need as much privilege granted to it as humans would need when manually testing an application, so those should probably be separate “test accounts,” too.
- The synthetic monitoring tool will probably be running against your production environments, so following the principle of least privilege becomes particularly important for the identity used with synthetic monitoring.
- Whether you are automating testing or doing it manually, would it perhaps help to create different “test accounts” designed for testing different aspects of your application, each of which has a different set of privileges/authZ? Talk it through with your team, because there is a tradeoff of increased complexity to manage. This is an “it depends” situation; don’t decide this one alone, if you can avoid it.
Least privilege
Explicit authZ
Maybe the bug you’re interested in can be found on a less sensitive part of the web application you need to test.
In my case, I was able to also find the “direct deposit page” bug on the page that simply displays an employee/vendor name, so I made my Playwright test only visit that page.
Ideally, if I’d had a standalone identity for “testing” purposes to log in with, people would have long since figured out that the bug also exists on the “display your own name page.”
And then while logged into the self-service portal with it, the portal simply wouldn’t have showed me the “direct deposit” page as an option, because that identity wouldn’t have had any authZ for managing direct deposit accounts.
Synthetic data
Can you structure your application’s real data so that “real data” and “fake data” can live side-by-side in the same database without accidentally getting mixed up with each other? (e.g. totals and averages accidentally counting the wrong data)
- Read: If you’re making a “supervisor portal,” can your “fake bank branch teller” that you created as a “test account” simply only be able to pull up … fake people’s fake bank accounts?
- That could also help solve the problem of accidentally capturing sensitive data in video recordings of browser-based automated tests.
- Write: And if any teller flagged as fake tries to make a deposit, the application knows to flag it as one of the “fake transactions,” not as a “real transaction”?
From what I understand, keeping some “synthetic data” around that’s meant to be used by “synthetic accounts,” and authoring your application to know to discard “fake transactions” when it recognizes them, is a common practice when designing applications for “synthetic monitoring” of production systems.
Why not just … also try that strategy for your CI/CD-based regression testing, while you’re at it?
Unauthenticated UI testing
Come to think of it, do you even need to be testing any of your web site’s user interface against real data, if you control the whole application’s codebase?
If you control the application’s implementation, then just CI/CD-regression-test its frontend user interface (“UI”) against a mocked-out backend if you want to know whether:
- it passes WCAG audits
- modals open and close with keyboard controls
- tab orders cycle correctly with keyboard controls
- when you click “add to cart” and the (in this case, mocked) API sends back a fresh “order subtotal” value, the UI’s “order details” widget’s “subtotal” display increases appropriately
- etc.
Remember: if you have the opportunity to control how “http://localhost
” behaves when you spin up your frontend, then hypothetically, localhost
mode can simply have a mode that doesn’t require authentication and always uses a mock backend.
(You could even take this idea lower in the testing pyramid and do a lot of your UI testing right within build process for the programming framework that renders the frontend in the first place. Karma, Vitest, Jest, JSDom, Playwright Component Testing, etc. – who even really needs a full-on classic “localhost
?”)
Then, the only true end-to-end test you would need to run as part of CI/CD-triggered regression testing, is validating whether login works. But that doesn’t require much authZ at all – you could restrict permissions grants so that the only thing your “test account’s” identity can do, when authenticated into your system, is view a “welcome to the test page – yes, you’re logged in” special URL. Now that’s least privilege!
Health check URLs
Actually, synthetic monitoring isn’t usually interested in re-validating WCAG compliance and whatnot (it figures if the build-time CI/CD-driven regression tests passed, it’s not like things are going to just suddenly move around on the page of their own accord, so that’s not important to re-test every few minutes).
So another approach to protecting synthetic monitoring – perhaps especially in the realm of repeatedly validating authenticated APIs’ health, but sometimes in the realm of validating minor UI things – is to simply add a URL to your application that’s named something along the lines of “example.com/health-check
.”
- You might not even requrie authentication to visit the “health check” URL.
- Or you might require authentication, but you create a dedicated “can do health checks” authZ/privilege/role and grant nothing but that permission to the identity that your synthetic monitoring tool logs into the health check URL with.
- This is the same idea as the “yes, you’re logged in” URL idea from above.
- It’s just that in this variation, the special URL’s contents also display a bit more detail like:
- “yes, under the hood I, the application server, validated that checkout is still up”
- “yes, under the hood I, the application server, validated that currency conversion is still up”
Implicit authZ
It’s not even just the “happy path” of protecting your web application itself that you need to think through.
What if there are implicit privileges that every authenticated identity has by default, outside of your application?
Has your company – or even the identity provider itself – implicitly granted every identity in the identity provider you’re using for authentication the ability to do certain things that have nothing to do with your application?
For example, I’ve heard that any identity created within an on-premises Microsoft Active Directory domain can, by default, list out every other identity that exists within the same domain.
Teams making web applications need to protect the inside of their application, but they also need to partner with experts from their larger enterprise to lock down other implicit authorizations/privileges so that the identities they’re using for testing also can’t attack other systems.