EXPERIMENT READINESS AUDIT
You’ve got the analytics tool, the traffic, and the hypothesis. But your tracking events are misfiring, your sample sizes are too small, and your test designs have validity errors you’d never catch from a results dashboard. The Audit tells you exactly what’s broken — and what to fix first.
5 audit areas checked — go/no-go verdict or refund · 5-day delivery
WHAT THE AUDIT COVERS
Fixed price · 5 business days
We review your tools, data, and test plans. You get a clear list of what's broken and how to fix it, so your results are trustworthy.
MARKETING MANAGER
"Why did our button color test show no change?"
We find that the 'click' event wasn't firing for the new button. Your test dashboard showed no difference, but the data was missing. Now you know to fix the tracking before you trust the result.
PRODUCT LEAD
"Is 500 users enough to test this new feature?"
We calculate that you'd need 2,000 users to see a real effect. Running it now would likely show 'no winner' even if one exists. You save weeks by waiting until you have enough traffic.
DATA ANALYST
The 'control' and 'test' groups aren't actually separate.
We spot that users can be in both groups because of a setup error. This mixes up the results, making them useless. You get the steps to properly isolate your groups for a clean test.
WEEKLY REPORTING
Your winning experiment might be a false alarm.
We check if your statistical significance is set correctly. Often, a '95% confident' win is actually much less reliable. You learn how to adjust your settings to stop celebrating false positives.
A clear go/no-go on whether your setup can produce valid experiments — or a specific list of what to fix first. Refund if we can’t deliver that.
Instrumentation, tracking plan, tool config, sample size, and experiment design. The five areas that determine whether your results are valid.
You get a definitive answer on whether your setup supports valid experiments. No ambiguity, no interpretation required.
YOU ALREADY SUSPECT YOUR RESULTS AREN’T TRUSTWORTHY
Ran a 4-week experiment — results flipped when one parameter changed
“We ran an experiment for a full month. The results said the new onboarding was better. We re-ran it with a different traffic split and the results flipped. Now nobody trusts any of our experiment results.”
VP Product — B2B SaaS, $8M ARR
Tool says ‘significant’ — but the team isn’t sure it’s real
“Our experimentation tool keeps telling us results are statistically significant. But when we look at the numbers, something feels off. The effect sizes don’t match what we see in our analytics. We’re making decisions on results we don’t fully trust.”
Head of Growth — Series B
Tracking events don’t match what users actually do
“We discovered our ‘signup completed’ event was firing twice for some users and not at all for others. We’d been running experiments on that event for six months. Every result from those tests is unreliable now.”
Product Manager — B2B SaaS
No idea if you have enough traffic to trust your results
“We ran a test for three weeks and got a winner. Then someone asked if we’d had enough traffic for the result to be valid. Nobody could answer that. We shipped it anyway. Three months later we’re not sure it actually helped.”
CEO — Seed stage
WHAT THIS TYPICALLY UNCOVERS
Most experiment setups have at least one validity threat that makes results unreliable.
Configuration issues — misfiring events, incorrect randomization, or sample size shortfalls — can silently invalidate results. The dashboards don’t flag these. They only show you the numbers.
Tracking plans rarely map to the metrics teams actually test.
Teams build tracking plans for product analytics, not for experiments. The events you track for engagement dashboards aren’t always the events you need to measure experiment outcomes. The gap between what you track and what you test is where invalid results come from.
Sample size is the most common blind spot in experiment programs.
Teams often calculate sample size once, at the start of a program, using estimated baseline rates. When actual rates differ — and they usually do — the calculated sample size is wrong. Experiments end too early or run too long, and neither produces valid results.
Tool misconfigurations don’t produce visible errors — they produce wrong results.
An experimentation tool that’s misconfigured doesn’t show a red warning. It shows a result that looks correct but isn’t. The most common issues — duplicate event firing, incorrect assignment units, and filtering that excludes the wrong users — are invisible until someone audits the setup.
WHY THIS IS DIFFERENT
Most teams find out their setup is broken after they’ve already shipped a losing variant. We check before you run your next test.
“Run the experiment and see what happens” assumes your setup produces valid results. It might not. This audit checks whether your tracking events fire correctly, whether your tool is configured for valid experiments, and whether you have enough traffic to reach statistical significance — before you spend weeks running a test that produces unreliable data.
You get a clear go/no-go: your setup can support valid experiments, or here’s exactly what to fix first. No interpretation required — each finding is tied to a specific configuration change your team can make.
TIMELINE
Read-only access to your analytics and experimentation tools. Your tracking plan and experiment history reviewed. Current setup mapped against what valid experiments require.
Event instrumentation checked against actual user behaviour. Tool configuration reviewed for validity threats. Sample size calculated from your real traffic data. Existing experiment designs reviewed for common errors.
A clear go/no-go on whether your setup supports valid experiments — or a specific list of what to fix first, ranked by impact. Delivered async. No meeting required.
Day 6: you know your next experiment will produce results you can trust.
WHAT YOU GET
Every event your team relies on for experiment measurement is verified: whether it fires correctly, whether the properties are complete, and whether the data it produces is trustworthy. Most teams discover their biggest problem isn't their hypothesis — it's that their data was never clean enough to answer the question.
Your tracking plan is evaluated against the specific metrics you need to run valid experiments. Gaps between what you're tracking and what you need to measure are documented before you waste a test cycle discovering them mid-experiment.
Your experiment tool's configuration is reviewed for settings that silently invalidate results — traffic allocation bugs, logging gaps, SRM triggers, and more. These are the failure modes that look like a working test until you check carefully.
Given your actual traffic volumes and baseline conversion rates, you'll know whether you have the volume to run valid experiments at all — and if not, what minimum detectable effect size your current traffic can actually support.
Between 3 and 5 of your recent experiments are reviewed for design problems: ambiguous hypotheses, wrong primary metrics, underpowered samples, and early stopping. You'll see exactly what went wrong and how to avoid it next time.
A complete, structured report on the state of your tracking, with each event's status clearly documented: clean, problematic, or broken — and what each problem means for experiment validity.
A written assessment of your tracking plan against what valid experimentation requires, with each gap documented and a recommended fix.
A documented list of every configuration issue found, ranked by how severely each one affects experiment validity, with recommended remediation steps.
Configured for your actual traffic and baseline rates, so anyone on your team can check whether a proposed test is adequately powered before it runs.
A written critique of each reviewed experiment, documenting what was done correctly, what introduced validity risk, and what would have produced a more reliable result.
A clear, direct verdict: are you ready to run valid experiments right now? If yes, you get a prioritised fix list for the remaining gaps. If no, you get a specific list of what needs to change before you proceed.
The full written audit, including the verdict and every finding, plus a sequenced fix list for proceeding to experiments.
A live walkthrough of every finding, recorded for your team. For 15 days after delivery, questions about fixes and implementation get answered directly.
Everything above for $997. No hourly billing. No scope creep. Everything stays with your team.
FIT CHECK
The situation
You’re running A/B tests — or about to — but nobody has confirmed that your tracking events fire correctly, your tool configuration supports valid randomization, or your traffic can reach statistical significance in a reasonable window. You have an analytics tool and a testing tool. What you don’t have is confidence that the results they produce are reliable enough to act on.
What you leave with
Your next experiment produces results you can trust — because the setup was verified before you ran it.
When this audit doesn’t apply
If you don’t have an analytics tool collecting event data, there’s nothing to audit — the instrumentation doesn’t exist yet. If you need someone to build your experiment program from scratch, this audit tells you what’s missing but doesn’t build it. And if your analytics implementation itself is broken, you need that fixed before checking whether experiments will run correctly on top of it.
Better starting points
The Experiment Readiness Audit checks whether your setup can produce valid experiment results. Your team runs the experiments and builds the program. If you need implementation or ongoing support, that’s a different engagement.
Jake McMahon — ProductQuant
I run this audit myself. The event instrumentation check, the tracking plan review, the tool configuration audit, the sample size calculations, the experiment design critique — all of it. Your experiment setup is not generic. It’s specific to your product, your tracking architecture, and the gap between what your analytics tool reports and what actually happened. Generic checklists tell you to “verify your events” without telling you which events matter for experiment validity.
The audit produces a verdict your team can act on. The instrumentation gaps tell your engineer what to fix. The sample size calculations tell your PM which experiments to run — and which not to. The experiment design critique tells your growth lead what to change before burning a quarter of traffic on an invalid test. No translation required — every finding is formatted for the person who needs to act on it.
Teams Jake has worked with




PRICING
Clear go/no-go verdict — or specific list of what to fix, refund if we can’t deliver.
Book the Audit →Clear go/no-go verdict on whether your setup supports valid experiments — or a specific list of what to fix. If we cannot deliver that verdict, we refund the audit. You either get a definitive answer or you don’t pay.
Your tracking verified. Your tool configuration checked. Your sample sizes calculated from real data. A go/no-go verdict you can trust — or a specific list of what to fix.