Schaun Wheeler
August 18, 2022
16 MIN READ
Engineering

Peek under the hood of our message-orchestration infrastructure

Communication at scale is different. Regular communication is a car. You need a spaceship.

Early on, when we introduce people to what Aampe does (see here for details), we often run into this sort of question:

“So…you’re basically just sending messages, right?”

Hoo boy. Take a seat. Let’s talk about this.

Anyone who has ever used a mobile device has an intuitive sense of what it means to send a message: 

  1. Decide who you want to talk to.
  2. Decide what you want to say.
  3. Send what you want to say to who you want to talk to.

Practically all marketing automation software out there mimics this same workflow. You decide who you want to talk to by setting up segments and defining campaigns, then you decide what you want to say by writing a message for a particular segment and campaign, and then you send (or schedule a send). Seems simple, right? Every one of us does the same thing on our phone every day - multiple times a day. 

How hard can it be, right?

Answer: really freaking hard.

To be fair, it’s very easy to follow the intuitive workflow for messaging when you only have to message your friends and family and maybe a few more distant acquaintances - you know all of those people, more or less, and you can assume that you and they share a common understanding of why you would reach out to them and what kind of response your hoping for. The messages you send every day are hand-crafted, because there aren’t that many of them, and because they can assume a certain social contract.

If you’re communicating with customers, however - especially if you’re an app with anywhere between tens of thousands and millions of users, you can’t assume the same social contract and you don’t have the luxury of artisanal communication. You can put all the time and effort you want to craft the one perfect message…but when you send it to 100,000 users, it’s not perfect for all of them. It can’t be. We’re talking about 100,000 different people, with at least 100,000 different moods, preferences, assumptions, and interests. There is no one-size-fits-all. Even a (rare) one-size-fits-most is only going to fit a bare majority. Crafting that one perfect message is a waste of time because, no matter how carefully you shape it, it won’t be a fit for most of your users. We’re humans. We’re wonderfully diverse. That’s just the reality we live in.

Ditch your car. Use a spaceship.

Communication at scale requires a different set of tools than artisanal communication. Artisanal communication is a car. For communication at scale, you need a spaceship.

Here’s a diagram of all the moving parts of Aampe’s spaceship:.

Believe it or not, this is the non-technical version of the diagram. The big boxes represent different automation workflows. Smaller boxes with rounded corners represent major automation tasks. Trapezoids represent different AI tasks. The two big boxes in blue are so extremely over-simplified that we’re going to give each of those sections their own full diagrams in future posts, but the super-simple version will work for our current purposes.

Start with messages

Look at the USER INPUTS section in the top-left corner of the diagram. It’s in blue, meaning it deserved a whole diagram of its very own (which we will link to when we publish it). The User Inputs workflow is partially automated: our customers create labels that represent different business priorities, and then they use those labels to tag formulas. A formula is a message template - you write a base message, and then you identify building blocks within the message, and then swap out building blocks for alternates to turn one message into many. 

That sounds more complicated than it really is. Take a look at this video which walks through the entire process in just a few minutes.

Having lots and lots of labeled messages gives Aampe’s learning systems the material it needs to discover user preferences. It does that by identifying goals (notification clicks, add-to-cart events, checkouts - whatever app activity most interests you), and orchestrating campaigns (which are different from the campaigns you see in marketing automation systems…but we’ll talk more about that another time).

So the User Inputs workflow takes information about labels, formulas, goals, and campaigns, and outputs tons and tons of messages that are discovery tools as well as simple packets of communication. All of the information Aampe needs to learn user preferences is baked right into the messages we send.

Look at your users, not just their clicks

The EVENT INGESTION workflow gives us the data we need to learn from messaging. We don’t require SDKs or anything like that. Our customers set up a simple daily dump of data from the CDP into a bucket on Google Cloud Platform. If they can only give us their click data, then great - we’ll optimize clicks. Most of our customers opt to give us more data than that - we get detailed information about what each user does within the app. This allows us to learn more efficiently, but also allows our system to optimize for events that really matter - events that generate revenue.

Dumping the data into a bucket is all our customers need to do, but once they’ve done that, we deduplicate, clean, and process the data to make sure we’re feeding minimal noise into our learning systems.

To figure out where to go next, get a map

The ATTRIBUTE SETUP workflow uses the ingested, deduplicated, cleaned, processed events data to create what we call the “User Landscape”. (You can read more about the User landscape here). The User Landscape is a summary of all the baseline behavior for each user. We start by aggregating events data for different events and different time periods. Then we transform those aggregations to ensure they are all represented on a similar scale (throwing data of different scales together into an analysis can cause some data to have an outsized impact on model estimates). We then denoise and condense the whole thing through a machine-learning process called dimensionality reduction (and here’s a Wikipedia link in case you want to learn more about it).

The User Landscape is the output of the dimensionality reduction process - it’s a machine-readable map of every user’s similarity to every other user. Throughout our learning, we need to make sure that similar users get different messages and different users get similar messages, and we need to measure the impact of those similarities and differences on user behavior. The User Landscape lets us do that. It keeps our models honest by forcing the models to think about what each user was likely to do anyway before concluding that a particular messaging choice had an impact.

Figure out timing

We find customers are often so focused on crafting good copy for their messages that they forget an even more important consideration: the timing. A brilliant message isn’t brilliant if a user gets it a time when they can’t or won’t respond to it. Our systems learn timing preferences, which requires a number of preparatory steps. You can see these in the TIMING ASSIGNMENT PREP workflow.

First we define windows to structure our preference learning. It would be silly to try to learn users preferences between 6:01pm and 6:02pm. For most customers, we find it works well to bin the hours of each day into five, three-hour windows, which cover typical waking hours. Five time windows times seven days of the week creates 35 windows for which we can learn preferences.

The rest of our timing prep involves determining which users have a good chance of responding to a message. We use an algorithm to create an activity estimate that tells us the probability that we’re going to see each user on the app again, given how long it’s been since we saw them last. We also calculate a responsiveness estimate, which tells us the probability that a user will respond to a message, given how many times we’ve messaged them and how many times they’ve responded to those messages in the recent past. We use both of these estimates to set the message cadence - which determines how often we enter a user into a campaign to be messaged, as well as how many messages we send them once they’re in the campaign.

Boost responsiveness with recommendations (optional)

Many of our customers have seen a lot of value from incorporating personalized item recommendations into their messaging. They don’t have to have their own RECOMMENDER SYSTEM to do this - we provide one as part of Aampe’s overall infrastructure, fed by the event data we already incorporate into other parts of the automation process.

The system is simple: we create an item-item matrix that models the likelihood of viewing, adding to cart, buying, or otherwise interacting with any item in the app inventory, given the items users have interacted with previously. We can feed any user’s interaction history into that matrix and get a list of personalized recommendations. We can augment that list (and overcome the “cold start” problem inherent in most recommender systems) with recommendations based on overall popularity. We combine those two lists to select as many recommendations as we need for each user. 

Figure out copy

Once we know when we’re going to message users, and what we’re going to message them about (if we use the recommender system), we still need to decide how we’re going to talk to them. This is COPY ASSIGNMENT PREP

First, we determine copy availability. Not all copy is available for all users at all times. Sometimes we run campaigns that only function in certain countries or for certain languages. Sometimes users are only added to a campaign if they trigger a particular event in the app, such as adding an item to their cart. Sometimes formulas themselves are available for only a limited time (this is common when our customers run time-sensitive sales). We look at which users are available to be messaged, what campaigns those users are eligible for, and what messages are valid for those campaigns. We then insert the recommendations (if applicable) into the message text. 

Now that we have complete messages, we process the labels attached to each message to allow the system to know what messaging choices each particular message encapsulates. We then match labels to personalization scores (we’ll talk more about those scores, below). These scored messages give us an estimate of how strongly each possible message aligns with each user’s preference profile. 

Assign message choices (this is harder than it looks)

Everything we’ve discussed so far was just preparation to be able to actually send a message to a user. That’s because, with Aampe, messaging serves two purposes: to delight your users now, and to figure out what will delight them next. Accomplishing those two purposes at the same time takes a lot of preparation, some experimental design, and a whole lot of AI. The decision of when, how, and about what to message each user is called ASSIGNMENT.

First, we compile all of the personalization scores for all users. Each score represents each individual preference for an individual messaging decision. If you write some messages to emphasize the quality of your product, each user will have a score estimating how likely they are to respond to a message about quality. If you write some messages emphasizing the low cost of your product, each user will have an estimate of how likely they are to respond to a message about cost. The same goes for all of the timing windows we set up - everyone gets a score for each window.

We aggregate those scores for each user to estimate how likely a user is to respond positively if we message them with the best set of choices we have for them, personally, at the moment. Some users would respond to practically any kind of message sent at any time. Some users won’t respond to anything, not matter what it’s about, or when it’s sent. Most users are somewhere in between. We algorithmically select users for preferential assignment, which means we send them what we think will be the best when, what, and how we have. If they don’t get selected for preferential assignment, we put them into conditioned assignment, which means we recognize that we don’t have a “best bet” for them, so we put them into an experiment to learn what they like.

The details of conditioned assignment are illustrated here, but to be brief: we pull in the User Landscape we calculated from all of those aggregated app events, and transform the landscape to make it usable in a clustering algorithm, where each cluster represents a group of similar users. We then assign each possible messaging choice within each cluster. This spreads all of our different messaging possibilities across all of our different kinds of users, which sets us up to learn efficiently.

Actually sending the messages is the only easy part

The actual MESSAGING step is simple: we have a list of users and each user has a message assigned to them, so we send that information to whatever communication provider our customers like to use (Braze, etc.), and then monitor that system to flag any messaging failures. Easy-peasy.

The personalization models are were the magic happens

Now we come to the actual MODEL. Notice that this workflow is outlined in blue on the diagram. That means we’ll have another post soon that explains this section in more detail. Here are the basics:

First, we compile a feature set for the model. This includes the values for the User Landscape, the personalization scores used to make all the decisions about when and how to message each user, as well indicators showing which particular messaging choices were made for each individual user. We also compile outcomes - monitor the app event data for a period of time after each message was sent to see if the user did the goal events that we hoped they would do. In cases where they did do those events, we compile weights that indicate how much attention the model should pay: if a user did a lot of the goals right after the message we sent, the model pays a lot of attention, but if a user did only one goal a long time after being messaged, the model only pays a little attention.

The model training incorporates the feature set, outcomes, and weights. The particular algorithm we use learns contingencies - if a user is in a certain place in the User Landscape, and has a particular preference profile, and received a certain message timing and copy, and triggered the desired goal events, the the model will learn that similar users with similar preference profiles are likely to do the same if exposed to the same messaging conditions. This allows us to infer what each user might do if presented with a particular message, even if they haven’t received that particular message yet. 

We use the model to generate two different estimates: a baseline probability to respond (which only looks at users’ historical behavior) and a target probability to respond (which considers each possible messaging choice one by one). We compare the probabilities to calculate a preference - what we call a “personalization score”, which represents the degree to which the target probability outpaces the baseline probability. So if we write a message that emphasizes how fast delivery is, each user will get a personalization score that tells us the probability that they’ll respond if we send them a message about fast delivery.

Use messaging experiments to understand you users better

Aampe’s learning systems allow you to automatically adapt your messaging to each individual user, but our infrastructure also enables a whole new world of REPORTING  that allows you to understand those users better. 

We check all the boxes on normal reporting - app activity metrics that give you basic insights into what your users are doing, and how active they are, as well as message metrics that help you understand how and how often the system is communicating for you. We also provide a lot of different conversion metrics that help you understand which messaging choices tend to correspond to the most success.

We also add in some more advanced reporting that give you greater assurances of return-on-investment and greater insights into your users. For customers who enable the recommender system, we’re able to tie messages that mention a specific item to purchase events that involve those same items. This recommender attribution draws a clear line between user communication and user lifetime value. Even in cases where we don’t have a specific item to work off of, our message attribution method employs a control group to estimate the impact of the messages the system sends. In addition to these attribution methods, we also use personalization scores (which our our estimate of confident the system was in its messaging choices) to estimate ROI in terms of user actions (add-to-cart, checkout, etc.) or even monetary value such as revenue or margin.

We also pipe some of these insights back into our COMPOSER tool where personalization metrics for specific labels or formulas help you understand what has worked so far, and our personalization opportunity matrix helps you make decisions about what messages to write next.

Messaging at scale demands a different way of communicating

Can you use standard marketing automation software to write messages the way you’re used to writing them? Of course you can. Your users will hate it. You can’t send the same message out to thousands of people at one time, or send very similar messages out to the same thousands of people over and over again, or both, without most of them realizing what you’re doing. People are smart enough to recognize when you’re trying to nudge them into doing things you want them to do, and they naturally resent the attempt. 

There’s a better way, of course. Instead of deciding what you want your users to do and then pestering them in hopes that a few will give in and do it, figure out what each individual user wants from the list of things your app offers, and make it easy for them to get that thing they want, when they want it. At its worst, messaging is a pesky reminder of something you don’t want to be reminded of. At its best, messaging is a chance to skip the line, avoid the menus of menus in the app, and get right to the movie, the food delivery, the dress, or the shoes that you want. 

You don’t get that kind of magic from message blasts. You get it from a system that take your priorities as a business, and then learns and adapts to match your app’s offerings up to your users’ individual priorities. That’s not simple, and it’s not easy. That’s why Aampe exists - we built all the complicated stuff and put it under the hood, so you can just focus on flying the spaceship.