CUPED (short for Controlled Experiment Using Pre-Existing Data) is a technique that leverages user information from before an experiment to reduce the variance and increase confidence in experimental metrics. This can help de-bias experiments with a meaningful pre-exposure bias (e.g., the groups were randomly different before any treatment was applied).


Variance Reduction and CUPED

Variance is a measurement of dispersion that measures the amount of "noise" in a metric or experiment results. Higher variance is associated with larger confidence intervals, which leads to experiments requiring a larger sample size to consistently observe a statistically significant result on the same effect size.

Due to the lower sample required, reducing variance can lead to shorter experiment run times. Because of this, techniques have been developed to reduce the variance in experiment results, thereby reducing run times and increasing confidence.

At Intempt, we use CUPED, which is automatically applied to experiments and run for the topline results. This observably leads to significant variance reduction in most metrics where CUPED can be applied.

📘

Good to know

CUPED can equate to getting 20% or more traffic during your experiment in the right conditions.

  • In 2016, Netflix reported that CUPED reduced variance by roughly ~40% for some key engagement metrics (source).
  • In 2022, Microsoft reported that, for one product team, CUPED was akin to adding 20% more traffic to the analysis of a majority of metrics (source).

Configuring CUPED

In Experiments > Metrics > Statistical setup, under "Confidence interval method," you can choose "Sequential testing" or "Sequential testing with CUPED."

The latter, once selected, will enable CUPED variance reduction for your experiment.

If the experiment is already running, Intempt will re-calculate its results with CUPED data included.

Where CUPED works best

  • CUPED works best on metrics and behaviors that are predictable from past behavior; in particular, if a metric is consistent over time for the same user, CUPED can be very effective
  • CUPED also acts as a partial solve for pre-exposure bias. If one group has a systemic bias in their pre-exposure data (which is independent of the experiment group they are in), their adjusted metric value will be adjusted towards the population mean.

Where CUPED is less effective

  • CUPED does not work on new users because there is no pre-exposure data to leverage
  • CUPED will not be applied historically for newly created metrics or metrics that were added to primary or secondary metrics
  • CUPED is less effective if a user's metric value is uncorrelated with historical behavior