EXPERIMENT READINESS AUDIT

Jake McMahon
Jake McMahon — ProductQuant
8+ years B2B SaaS · Behavioural Psychology + Big Data (Masters)

Know whether your experiment setup produces valid results — before you run your next test.

You’ve got the analytics tool, the traffic, and the hypothesis. But your tracking events are misfiring, your sample sizes are too small, and your test designs have validity errors you’d never catch from a results dashboard. The Audit tells you exactly what’s broken — and what to fix first.

5 audit areas checked — go/no-go verdict or refund · 5-day delivery

WHAT THE AUDIT COVERS

Event instrumentation Are your tracking events firing correctly and capturing what matters?
Tracking plan Does your event taxonomy map to the metrics you’d actually test?
Tool configuration Is your experimentation tool set up for valid experiment results?
Sample size Do you have enough traffic and events to reach statistical significance?
Experiment design Are your test designs free of common validity errors?

Fixed price · 5 business days

We check your experiment setup for hidden problems.

We review your tools, data, and test plans. You get a clear list of what's broken and how to fix it, so your results are trustworthy.

MARKETING MANAGER

"Why did our button color test show no change?"

We find that the 'click' event wasn't firing for the new button. Your test dashboard showed no difference, but the data was missing. Now you know to fix the tracking before you trust the result.

PRODUCT LEAD

"Is 500 users enough to test this new feature?"

We calculate that you'd need 2,000 users to see a real effect. Running it now would likely show 'no winner' even if one exists. You save weeks by waiting until you have enough traffic.

DATA ANALYST

The 'control' and 'test' groups aren't actually separate.

We spot that users can be in both groups because of a setup error. This mixes up the results, making them useless. You get the steps to properly isolate your groups for a clean test.

WEEKLY REPORTING

Your winning experiment might be a false alarm.

We check if your statistical significance is set correctly. Often, a '95% confident' win is actually much less reliable. You learn how to adjust your settings to stop celebrating false positives.

DELIVERY
1 verdict

A clear go/no-go on whether your setup can produce valid experiments — or a specific list of what to fix first. Refund if we can’t deliver that.

SCOPE
5 areas

Instrumentation, tracking plan, tool config, sample size, and experiment design. The five areas that determine whether your results are valid.

GUARANTEE
Go/No‑Go

You get a definitive answer on whether your setup supports valid experiments. No ambiguity, no interpretation required.

YOU ALREADY SUSPECT YOUR RESULTS AREN’T TRUSTWORTHY

Ran a 4-week experiment — results flipped when one parameter changed

“We ran an experiment for a full month. The results said the new onboarding was better. We re-ran it with a different traffic split and the results flipped. Now nobody trusts any of our experiment results.”

VP Product — B2B SaaS, $8M ARR

Tool says ‘significant’ — but the team isn’t sure it’s real

“Our experimentation tool keeps telling us results are statistically significant. But when we look at the numbers, something feels off. The effect sizes don’t match what we see in our analytics. We’re making decisions on results we don’t fully trust.”

Head of Growth — Series B

Tracking events don’t match what users actually do

“We discovered our ‘signup completed’ event was firing twice for some users and not at all for others. We’d been running experiments on that event for six months. Every result from those tests is unreliable now.”

Product Manager — B2B SaaS

No idea if you have enough traffic to trust your results

“We ran a test for three weeks and got a winner. Then someone asked if we’d had enough traffic for the result to be valid. Nobody could answer that. We shipped it anyway. Three months later we’re not sure it actually helped.”

CEO — Seed stage

WHAT THIS TYPICALLY UNCOVERS

The biggest validity threat is almost always invisible from your dashboard.

Most experiment setups have at least one validity threat that makes results unreliable.

Configuration issues — misfiring events, incorrect randomization, or sample size shortfalls — can silently invalidate results. The dashboards don’t flag these. They only show you the numbers.

Tracking plans rarely map to the metrics teams actually test.

Teams build tracking plans for product analytics, not for experiments. The events you track for engagement dashboards aren’t always the events you need to measure experiment outcomes. The gap between what you track and what you test is where invalid results come from.

Sample size is the most common blind spot in experiment programs.

Teams often calculate sample size once, at the start of a program, using estimated baseline rates. When actual rates differ — and they usually do — the calculated sample size is wrong. Experiments end too early or run too long, and neither produces valid results.

Tool misconfigurations don’t produce visible errors — they produce wrong results.

An experimentation tool that’s misconfigured doesn’t show a red warning. It shows a result that looks correct but isn’t. The most common issues — duplicate event firing, incorrect assignment units, and filtering that excludes the wrong users — are invisible until someone audits the setup.

WHY THIS IS DIFFERENT

Most teams find out their setup is broken after they’ve already shipped a losing variant. We check before you run your next test.

“Run the experiment and see what happens” assumes your setup produces valid results. It might not. This audit checks whether your tracking events fire correctly, whether your tool is configured for valid experiments, and whether you have enough traffic to reach statistical significance — before you spend weeks running a test that produces unreliable data.

You get a clear go/no-go: your setup can support valid experiments, or here’s exactly what to fix first. No interpretation required — each finding is tied to a specific configuration change your team can make.

TIMELINE

From read-only access to a clear verdict on your experiment setup.

DAYS 1-2

Access + Initial Review

Read-only access to your analytics and experimentation tools. Your tracking plan and experiment history reviewed. Current setup mapped against what valid experiments require.

DAYS 3-4

Deep Audit

Event instrumentation checked against actual user behaviour. Tool configuration reviewed for validity threats. Sample size calculated from your real traffic data. Existing experiment designs reviewed for common errors.

DAY 5

Verdict + Report

A clear go/no-go on whether your setup supports valid experiments — or a specific list of what to fix first, ranked by impact. Delivered async. No meeting required.

Day 6: you know your next experiment will produce results you can trust.

WHAT YOU GET

13 deliverables that tell you whether your experiment setup can produce valid results.

Deliverable 01
Event Instrumentation Check Across 50+ Tracking Events

Every event your team relies on for experiment measurement is verified: whether it fires correctly, whether the properties are complete, and whether the data it produces is trustworthy. Most teams discover their biggest problem isn't their hypothesis — it's that their data was never clean enough to answer the question.

Deliverable 02
Tracking Plan Review Against Experiment Metrics

Your tracking plan is evaluated against the specific metrics you need to run valid experiments. Gaps between what you're tracking and what you need to measure are documented before you waste a test cycle discovering them mid-experiment.

Deliverable 03
Tool Configuration Audit for Validity

Your experiment tool's configuration is reviewed for settings that silently invalidate results — traffic allocation bugs, logging gaps, SRM triggers, and more. These are the failure modes that look like a working test until you check carefully.

Deliverable 04
Sample Size Readiness Calculation

Given your actual traffic volumes and baseline conversion rates, you'll know whether you have the volume to run valid experiments at all — and if not, what minimum detectable effect size your current traffic can actually support.

Deliverable 05
Experiment Design Critique of 3–5 Past Tests

Between 3 and 5 of your recent experiments are reviewed for design problems: ambiguous hypotheses, wrong primary metrics, underpowered samples, and early stopping. You'll see exactly what went wrong and how to avoid it next time.

Deliverable 06
Event Instrumentation Check Report

A complete, structured report on the state of your tracking, with each event's status clearly documented: clean, problematic, or broken — and what each problem means for experiment validity.

Deliverable 07
Tracking Plan Review with Gap Analysis

A written assessment of your tracking plan against what valid experimentation requires, with each gap documented and a recommended fix.

Deliverable 08
Tool Configuration Audit Findings

A documented list of every configuration issue found, ranked by how severely each one affects experiment validity, with recommended remediation steps.

Deliverable 09
Sample Size Readiness Calculator

Configured for your actual traffic and baseline rates, so anyone on your team can check whether a proposed test is adequately powered before it runs.

Deliverable 10
Experiment Design Critique Document

A written critique of each reviewed experiment, documenting what was done correctly, what introduced validity risk, and what would have produced a more reliable result.

Deliverable 11
Go/No-Go Verdict with Specific Reasons

A clear, direct verdict: are you ready to run valid experiments right now? If yes, you get a prioritised fix list for the remaining gaps. If no, you get a specific list of what needs to change before you proceed.

Deliverable 12
Complete Audit Report (10–15 pages) + Fix Priority List

The full written audit, including the verdict and every finding, plus a sequenced fix list for proceeding to experiments.

Deliverable 13
30-Minute Readout Session (Recorded) + 15-Day Clarification Support

A live walkthrough of every finding, recorded for your team. For 15 days after delivery, questions about fixes and implementation get answered directly.

Everything above for $997. No hourly billing. No scope creep. Everything stays with your team.

FIT CHECK

Teams running experiments without confirming their setup works get the most from this.

GOOD FIT
B2B SaaS at $2M–$20M ARR running experiments without validating the setup
Analytics tool in place · experiments live or planned

You’re running A/B tests — or about to — but nobody has confirmed that your tracking events fire correctly, your tool configuration supports valid randomization, or your traffic can reach statistical significance in a reasonable window. You have an analytics tool and a testing tool. What you don’t have is confidence that the results they produce are reliable enough to act on.

  • A clear go/no-go verdict on whether your setup supports valid experiments
  • A specific list of what to fix, in order, before running your next test
  • Sample size calculations from your actual traffic — not estimates

Your next experiment produces results you can trust — because the setup was verified before you ran it.

NOT A FIT
No analytics tool, no experiment program, or need for full implementation
Wrong stage or different need

If you don’t have an analytics tool collecting event data, there’s nothing to audit — the instrumentation doesn’t exist yet. If you need someone to build your experiment program from scratch, this audit tells you what’s missing but doesn’t build it. And if your analytics implementation itself is broken, you need that fixed before checking whether experiments will run correctly on top of it.

What this audit doesn’t cover

The Experiment Readiness Audit checks whether your setup can produce valid experiment results. Your team runs the experiments and builds the program. If you need implementation or ongoing support, that’s a different engagement.

  • Running experiments — we audit the setup, your team runs the tests
  • Building your experiment program — we validate readiness, not build the program
  • Analytics implementation — we check tool configuration, not build the tracking
  • Ongoing support — the audit is a fixed-scope engagement, not a retainer
For full implementation → Growth LAB
Jake McMahon

Jake McMahon — ProductQuant

Jake McMahon
8+ years building retention, activation, and growth programs inside B2B SaaS · Behavioural Psychology + Big Data (Masters)

I run this audit myself. The event instrumentation check, the tracking plan review, the tool configuration audit, the sample size calculations, the experiment design critique — all of it. Your experiment setup is not generic. It’s specific to your product, your tracking architecture, and the gap between what your analytics tool reports and what actually happened. Generic checklists tell you to “verify your events” without telling you which events matter for experiment validity.

The audit produces a verdict your team can act on. The instrumentation gaps tell your engineer what to fix. The sample size calculations tell your PM which experiments to run — and which not to. The experiment design critique tells your growth lead what to change before burning a quarter of traffic on an invalid test. No translation required — every finding is formatted for the person who needs to act on it.

I won’t do this:
  • Give you a passing grade on your experiment setup if the data doesn’t support it
  • Audit your analytics implementation when what you need is an Analytics Audit
  • Tell you your sample size is “probably fine” without calculating it from your actual traffic
  • Review experiment designs without checking whether your tool can measure them correctly
What if you don’t have any experiments running yet?
The audit is designed for teams that are about to start experimenting — or have started and want to confirm their setup. If you haven’t run your first test, we audit the infrastructure: event instrumentation, tool configuration, sample size feasibility. You learn whether your setup can support valid experiments before you invest a sprint in running one. If the verdict is “not yet,” you get a specific list of what to build first. It’s better to know before your first experiment than to discover the results were invalid after.

Teams Jake has worked with

Gainify
Guardio
monday.com
Payoneer
thirdweb
Canary Mail

PRICING

Get a definitive verdict on your experiment readiness.

$997
one-time · fixed price
5-day async delivery
  • Event instrumentation check — every experiment-critical event verified
  • Tracking plan review — gaps and inconsistencies documented
  • Tool configuration audit — randomization, assignment, and filters checked
  • Sample size readiness — calculated from your actual traffic data
  • Experiment design critique — validity threats flagged before you run
  • Clear go/no-go verdict with specific fixes listed
  • Everything stays with your team permanently

Clear go/no-go verdict — or specific list of what to fix, refund if we can’t deliver.

Book the Audit →

Clear go/no-go verdict on whether your setup supports valid experiments — or a specific list of what to fix. If we cannot deliver that verdict, we refund the audit. You either get a definitive answer or you don’t pay.

Questions.

Or book a call →
What if we don’t have any experiments running yet? +
The audit is designed for exactly that situation. We check your infrastructure — event instrumentation, tool configuration, sample size feasibility — so you know whether your setup can produce valid results before you invest a sprint in running your first test. If the verdict is “not yet,” you get a specific list of what to build first. It’s cheaper to know before your first experiment than to discover the results were unreliable after.
How is this different from an Analytics Audit? +
An Analytics Audit checks whether your analytics implementation is correct — events firing, data flowing, dashboards accurate. The Experiment Readiness Audit checks whether your setup can produce valid experiment results — which includes analytics but goes further into randomization integrity, sample size feasibility, and experiment design validity. If your analytics itself is broken, start with the Analytics Audit. If your analytics works but you’re not sure your experiments will, this is the right audit.
What tools does this audit cover? +
Any experimentation tool — PostHog, LaunchDarkly, Optimizely, GrowthBook, Statsig, Amplitude Experiment, or a custom setup. The audit checks the configuration principles that apply regardless of platform: randomization method, assignment unit, traffic filters, and event instrumentation. The tool-specific details are covered as part of the configuration audit.
What do we own at the end? +
Everything. The instrumentation check, the tracking plan review, the tool configuration audit, the sample size calculations, the experiment design critique, and the go/no-go verdict with the specific fix list. All formatted for your team to act on directly — instrumentation fixes for your engineer, sample size estimates for your PM, design changes for your growth lead. There’s no dependency on ProductQuant after the audit ends.
What’s the guarantee? +
You get a clear go/no-go verdict — your setup supports valid experiments, or here’s the specific list of what to fix. If we can’t deliver that verdict because the data doesn’t exist or the tools aren’t configured enough to audit, we tell you within 48 hours and refund the full amount. No conditions. The guarantee is simple: you either get a definitive answer about your experiment readiness or you don’t pay.

Know whether your experiment setup produces valid results — before you run your next test.

Your tracking verified. Your tool configuration checked. Your sample sizes calculated from real data. A go/no-go verdict you can trust — or a specific list of what to fix.