Transactional apps, and particularly e-commerce apps, present a unique opportunity to very precisely determine ROI - how much an app needs to move the needle in order to justify the cost and effort it takes to engage users.

Aampe boosts targeted in-app behaviors

Here’s something we recently did for one of our e-commerce customers:

The plot shows the daily lift in three key app events achieved over roughly a month and a half of having Aampe’s learning systems manage the timing of user engagement messaging. For the first couple weeks, Aampe’s systems randomly selected when each user would receive a message. We don’t take credit for results that can be easily produced by random chance, so you can see that the lift hovers around zero during those two weeks. Then, at the end of October, the system shifts from full-time learning to applying lessons learned. As you can see, we saw an average daily lift over 20%, sometimes much higher, and both small and large black-out dates requested by the customer didn’t interrupt our ability to maintain those results.

These results prompt two important questions. First: how did we do it? But second and more important: how does this impact actual revenue?

Aampe works by racking up marginal wins at scale

There are two key elements to Aampe’s success in engaging users:

  1. We generate all the data we need. Aampe doesn’t work by mining historical behavior - it creates historical behavior, trying all the different ways of approaching users and learning which ways work best. Don’t think of this as some kind of aggrandized A/B test of a randomized controlled trial. Instead, think of it in terms of investment and hedging. Every message sent to a user is an investment in the idea that the details of that message will result in the user behavior we want to encourage. (In the case of the lift chart shown above, the details concerned what day of the week and what time of the day the notification hit each user’s phone.) We place those bets and monitor our returns.
  2. We distill learnings on a per-user basis. We summarize information about our investment returns into scores that create a “portfolio” for each individual user in your system. No segments. No user-defined workflows or rule sets. Just each individual user, each individual way we can alter a message, and the expected return for each user for each messaging choice.

In short, Aampe positions itself to consistently motivate user behavior in ways that create a huge number of individually small wins. Think about it: if you target a segment, your assumption is that you can influence all of the users in that segment in a similar way to get many of them to do a similar action. If you’re using segments, then you’re going after a small number of big wins. That doesn’t work well, for reasons we’ll discuss in a moment, but it makes sense that companies so often pursue the big wins: one big win is worth a lot of small wins, and crafting just the right message and sending to just the right users at just the right time is hard. 

Aampe turns that logic on its head: if one big win is worth 1,000 small wins, then use Aampe’s learning systems to amass 10,000 small wins. Using Aampe’s technology, it’s not any harder to personally target 10,000 different users than it is to target 10. We make personalization scalable, so you can easily pick up all the marginal wins that, up to now, you’ve been leaving on the table. 

That’s not the way most people think about user engagement, and it can take a little work to shift your perspective to the better way of doing things. To help with that, we’ve written a storybook that explains how we do things, as well as a big-ol’ technical memo for those of you who like to delve into the weeds. But in the meantime, here’s a snapshot that shows how the learning played out over a few days for our e-commerce customer:

Each column is a day in late October or early November. Each row is a day of the week and time window when we sent messages to users. The colored bars in each row of each column show the percentage of users who got certain scores assigned to their portfolio. So if you look at the first columns - October 27 - you can see that most rows are a single color, neither light nor dark. That means most users got a middling “personalization score” for most timing options, though there were exceptions: Friday 15:00 and 18:00 weren’t very good times for most users, though Friday at 12:00 was a little better than other slots for most users.

Over the next three days, the system experimented with different hedging strategies, allocating different preferences over different timing options, but in all cases, the bets placed were low-confidence (thus the blue is neither very light nor very dark for any slots during any of those days). By October 31, however, the system started to form some strong opinions, placing very high and very low confidence bets across different timing options, and assigning some users very high preference (dark blue) and other users very low preference (light blue) for the same timing option. The next day, the system pulled back on these strong bets, but had learned from them: you can see much more a gradation in preferences within different timing options. 

Because of particular dynamics in the user base of the customer in question, we had to skip learning on November 2. When the system placed bets for November 3, it continued as if nothing had happened, but it quickly learned that it had missed some important changes: thus, November 4 goes back to much more uniform preference allocations within each time slot. However, that slowing down over one day allowed the system to speed up with more diverse and more confident bets the next day.

No human monitored these results or specified how allocations should change over time. Aampe’s systems learned from experience, and adjusted automatically. By the time this learning was happening, our customer literally only needed to sit back and watch the results roll in.

It should be apparent from the above graph that user preferences don’t stay the same over time. Here’s another way of looking at it:

The rows show the number of different days we generated preferences for every user in our customer’s user base. The columns show the number of times a user’s top preference changed during that time. Preferences change a lot - users are people, and people don’t always want the same thing to the same extent over time. Instead of trying to pick a winning approach for each user and unrealistically hoping that they don’t change, we built Aampe to change as fast as users do.

And just in case you’re tempted to say “well, we can figure out what days work best for most users, and just use those as rules to send out notifications on a set schedule”:

The above image shows the lift attributable to each day of the week and time window. Each dot is one week of messaging (each slot, by definition, occurred only once per week). The dots are semi-transparent so darker dots are actually multiple overlapping dots. As you can see, there are very few slots that weren’t bad at least once - most were bad multiple times over just a few weeks. There were also very few slots that were consistently bad. The success of a given time slot changed based on user preferences and on the context of what was going on in those user’s lives.

Aampe doesn’t aim for single big wins. Those are elusive, expensive, and usually can’t be repeated (think about it: the perfect message isn’t the perfect message once you’ve sent it - because as soon as you send it, it’s stale). Instead, we focus on generating tons of fresh content (ideally pulled straight from the great content already created for the app) and constant betting and hedging to learn how each individual user wants to be contacted. That creates a steady stream of marginal wins that sum up to much more value than any single big win.

Pre-sale app events have a dollar value

The lift we proved dealt with three different app events: adding to wishlist, adding a product to a cart, and starting the checkout process. Of course, none of those events have actual value in and of themselves - even starting a checkout doesn’t mean anything if the user doesn’t finish the purchase. App events are signals of intent. They have a probabilistic relationship with revenue. However, the relationship between a push notification and something like an add-to-cart event is pretty straightforward. If you link to an item in the notification, and the user adds that item to the cart within a short time after receiving the notification, then attribution is reasonably clear. 

There are several reasons why we might try to influence a non-purchase event rather than influence the purchase event directly. In some cases, pricing information for purchase events may be kept in a point-of-sale system, whereas pricing for pre-purchase events are often more readily available in an app’s event logs, so focusing on pre-purchase events can lessen the technical burden of tying multiple systems together and let you start seeing value faster. 

But even if none of that is an issue, a payment is a final destination: once you’ve reached the point that someone does or does not make a payment, you’re done. You either declare success, or you start over in trying to influence the user’s behavior. An add-to-cart event (or add-to-favorites, or add-to-wishlist, or even simple item view) is a success that sets the stage for more success. Knowing someone triggered one of these intermediary events gives you added context to know how to move them further along. Focusing only on the final payment ignores the fact that many (perhaps most) final successes are going to result from multiple interactions with the customer.

It’s relatively easy to measure lift for intermediary events like add-to-cart, and those events automatically suggest next steps for added customer engagement, but the intermediary events themselves aren’t the thing the business ultimately cares about. We can translate the value of these events into the value of the ultimate goal, discounted by the risk we face of not reaching that ultimate goal from the intermediary event. If sending a notification is an investment, then we can calculate value-at-risk for that investment.

For our e-commerce customer, we identified four primary sources of risk that stood in the way of an add-to-cart turning into revenue (we altered the precise dollar amounts for the purposes of this blog post, but the proportions are all accurate):

  • Abandonment: This is the percentage of users who add to cart but do not buy. There are lots and lots of different studies on this subject, pegging the average abandonment rate at anywhere between 60% and 90%. The plot above shows outcomes under the assumption of 60%, 75%, and 90%.
  • Lift. This the percent increase in add-to-cart events obtained through notifications. The plot above shows estimates for lift values of 1%, 3%, 5%, and 7%, which is obviously much lower than what we actually achieved (and a whole lot lower than what we achieved for other customers). That’s the benefit of estimating value-at-risk - you can look at worst-case scenarios.
  • Margin. This is the % of revenue that is actually profit - so revenue minus all the costs associated with getting a product to the customer. Again, there are lots of possibilities, especially for e-commerce where the margin depends on the types of products you sell. We used margins here of 10%, 25% and 40%.
  • User value. This is a hard one. Not all users are equally valuable. Typically, the most valuable 1% of your users will account for 10-20% of the total value in your user base. The risk is that these higher-value users are actually the ones more likely to abandon, in which case the abandonment rate is a much greater source of risk. 

Having identified the sources of risk, we can either look at all of the different possibilities and decide which outcomes we are ok with, or we can work to narrow those assumptions. For example, having consistently achieved over 20% lift day after day, we could get rid of the different lift assumptions and just use an assumption of 20%. Likewise, for this particular customer, we were able to narrow down the uncertainty about which users abandon:

The table of red squares shows the abandonment rates of users in different value deciles. So the top row, labeled “10th”, is the top 10% of users (measured in terms of how much value they add to carts). And the bottom row, labelled “1st” is the bottom 10% of users. The columns show seven different locations in which the company operates. 

The first, third, fourth, and fifth columns show that high-value users are much more likely than low-value users to follow through with a purchase. The last two columns show some support for this, but might also be interpreted to mean that low-value users are no more likely to abandon than high-value users. The second column shows clear support for this “mixed-value” scenario, and none of the columns suggest that low-value users are more likely than any other users to follow through with a purchase. This allows us to change our risk view to focus on the mixed-value scenario as our worst-case scenario. 

After we solidified all of these assumptions, we were able to form a much more definite expectation about the return-on-investment our customer could expect from using Aampe to manage user engagement messaging:

We assumed a 20% margin based on a preference from the customer, assumed 10% lift and the mixed-value scenario for abandonment based on our empirical investigation. That meant that the actual abandonment rate was the only unknown, in which case we were able to present a variety of options. Among the user base in the particular country shown in the above plot, even a 90% abandonment rate would result in an average of over $0.50 per user in profit by achieving half of the lift that we were actually able to achieve day over day.

In conclusion: e-commerce apps can use messaging to create value for their customers and for their business. The best way to use messaging is to precisely measure the influence of notifications on specific individual app events - the actions a customer can take in the app - and tie those app events to bottom-line value for the company.