Data Science . May 03, 2023 . 2 MIN

Schaun Wheeler

Our new paper on arXiv shows how we developed an adaptive synthetic control group to evaluated our continuously-adaptive testing.

New on arXiv: validating Aampe performance with adaptive synthetic control groups

When people start using Aampe, they naturally want to understand how the system is performing. And when they ask how the system is performing, they naturally understand that Aampe can’t take credit for every app visit, add to cart, or purchase that happens in close proximity to a message being sent. There’s a baseline level of activity that you can expect on any app - the amount of stuff that will happen if you do no messaging at all, and estimates of the effectiveness of a particular messaging strategy should take that baseline into account.

One common way to validate performance is with a holdout group. There are a variety of reasons why this is not a great idea, perhaps the clearest one being that it creates apples-to-oranges comparisons when you’re adapting your messaging over time. Variations on a holdout group such as a switchback holdout or a synthetic control don’t solve the fundamental problem that, when you’re constantly changing your messaging in reaction to user responses, any holdout has a very limited shelf-life: they work well for single A/B tests that have a specific start date and stop date. They don’t work well for continuous adaptation.

That’s why we developed an adaptive synthetic control group, based on the same theoretical foundation as the Coarsened Exact Matching method developed by Gary King and his colleagues at Harvard. The method works by binning users into categories and then matching users who received a message with users that occupy the same bin but did not receive a message. We designate a monitoring window for each individual message sent, and identify a user who was at a similar level of activity on the app and who received a similar message recently, but did not receive any message that would have a monitoring window that overlaps with the message for which we’re seeking a control.

This allows us to evaluate the performance of our personalization scores (the metrics we use to determine which messaging choices an individual user will respond to).

It also allows us to calculate precise attribution estimates, which we expose in our Composer tool:

You can find more details about the method and our validation techniques in our technical paper here.

‍

The full paper is on arXiv.

New on arXiv: validating Aampe performance with adaptive synthetic control groups

Data Science . May 03, 2023 . 2 MIN

Schaun Wheeler

Our new paper on arXiv shows how we developed an adaptive synthetic control group to evaluated our continuously-adaptive testing.

Similar Articles

You don’t know Jacc(ard)

Scalable Event-Based Clustering for User Segmentation

Introducing Counterfactual Models: A Deeper Look at Message Timing

An Unexpected Use for Gini Impurity

Product

Company

Hey there! 👋

Thanks for reaching out. We'll answer your message as soon as possible.

Something went wrong! Please reload this page and submit your message again.

If this problem persists, reach out to us directly at get@aampe.com