Walkthrough 3 — Your first A/B experiment¶
Audience: PM who wants to test which approval path performs better. Time: ~15 minutes setup + days of data collection. Prerequisites: A workflow already exists (Walkthrough 2).
Goal¶
Replace a single approval path with a split: 50% of submitted Beneficiaries go through the Manager (slow but careful), 50% auto-approve under a value threshold (fast). After 3 weeks, see which arm wins on cycle time and rejection rate.
Step 1 — Frame the question¶
The designing-experiments skill forces you to write the question as:
"At
, do documents that go via have a better/worse than documents via ?"
If you can't write that sentence cleanly, the experiment isn't ready. Don't skip this step.
Step 2 — Define the experiment¶
Engineer walks the designing-experiments skill:
- Target DocType: Beneficiary
- Target workflow: Beneficiary Approval
- Where does the split go? On exit from
Submitted. - What's the split? 50/50 (default; you can change to e.g., 80/20 if rolling out cautiously).
- What are the two arms?
arm_a: Manager Approval (existing path)arm_b: Auto Approval (new path — terminal state with role System Manager)- What's the metric?
- Primary: cycle_time_seconds (lower is better)
- Secondary: outcome (rate of
approvedvsrejected)
Engineer prints the new workflow JSON. Confirm. API applied. Experiment is now live; new Submitted documents start being assigned.
Step 3 — Watch it run¶
After a few days:
Experiment exp_2026_05_fast_track (started 2026-05-03, running 5 days)
Assignments arm_a: 47 arm_b: 51 total: 98
Conversion arm_a: 89.4% arm_b: 88.2% diff: -1.2pp 95% CI [-13.1, +10.7]
Cycle time arm_a: 4.3h arm_b: 1.1h diff: -3.2h 95% CI [-3.8, -2.6]
Verdict cycle_time strongly favors arm_b.
Conversion difference is not significant — sample size too small.
Continue running until n ≥ 400 per arm.
Don't promote yet. Wait for sample size + clear signal.
Step 4 — Promote when significant¶
After 3 weeks, suppose arm_b is clearly faster and has equivalent (or better) approval rate:
This:
1. Builds a new Stack Workflow Def version with arm_b's path replacing the split state.
2. Sets experiment_status = Promoted B on the old version.
3. Opens a PR against the config repo.
4. New Submitted documents follow the simplified path; in-flight documents continue under their assigned arm.
Step 5 — Merge the PR¶
The deployer flow takes over. Reviewer + tester run on the simplified workflow. PR merged → bench migrate on prod → workflow updated. Old experiment data preserved in Experiment Assignment for the audit trail.
What I refused to do¶
- Promote before n ≥ 100 per arm (under-powered).
- Promote when CI on the primary metric crosses 0 (not significant).
- Promote on Friday afternoon without
--emergency. - Drop the losing arm's existing Experiment Assignment rows (audit-log integrity).
Common questions¶
| Q | A |
|---|---|
| Can I run two experiments at once? | On different workflows, yes. On the same workflow, no — assignment becomes ambiguous. |
| Can I change the split mid-run? | No. Doing so resets all existing assignments — bad analysis. End the experiment, define a new one. |
| What if a document gets stuck in arm_a? | Same as any stuck workflow — fix transitions, run /frappe-stack:diff, push. |
| Can I see which arm a specific doc is in? | Yes — every doc gets a experiment_arm Custom Field auto-added. Visible in the form. |