you assume everyone’s the same. 99.9% of tests are of this variety, the random nth, A/B test.
Hidden in many “losing” test results is a test idea that worked for some and not others.
Here are World Vision test results with prospective donors randomly split into the control (altruism) and test (self-interest).
Nothing to see here. The test didn’t win. Discard, move on, declare failure. That is the best choice if the alternative is breaking out response rates by random groupings within the test group. If you append lots of 3rd party data or demographics or channel behavior you’ll see lots of sub-groups within the test group that beat the control.
There is a 99.9% this is random noise, but it may feel like signal as our brains are trained to see patterns and explanation, even where none exist.
Consequently, the human bias is to find something that’s actually nothing. But our bias leads us to come up with a reason why it’s something – e.g., “people in our low dollar, high recency RFM bucket loved the test”, or ” people that collect stamps and skew to Generation Whatever hated our test.”
But, if your test wasn’t designed for these groups beforehand, finding these random differences after the fact is a waste of time and money.
What about designing your test assuming people will respond differently based on who they are? The World Vision test was such a test. The first bar chart is what we expected, no difference if you mash everybody together.
The altruism appeal – dubbed the control because it’s more typical in the sector – had the ask framed as,
- “Any donation you make will improve the happiness and wellbeing of an African family.”
The researchers didn’t think this would work well for everyone, they thought it would work for people high in Agreeableness as an innate, personality trait.
The self-interest appeal had the ask framed as,
- “Research by psychologists shows that donating money to charity increases the happiness and wellbeing of the giver.”
Again, this wasn’t a test idea developed to work for everyone, only those low in Agreeableness and higher in the innate Neuroticism personality trait.
Why Personality? It travels well. It’s part of you. It’s measurable, targetable and we know how to message to traits making the appeal more likely to be seen, agreed with and acted on.
Framing appeals to match who I am versus who I am not seems obvious except obvious is sometimes in short supply.
The experiment included measuring Personality traits after the donation. What did they find?
The test won among those low in Agreeableness and higher in Neuroticism and lost for those high in Agreeableness.
Framing matters. But it matters differently for different people.
You do have different donor segments but their ‘why’ of giving has zero to do with demographics, behavior, channel or any other internally defined segmentation. It also has zero to do with random, persona clusters that were created by throwing everything into a statistical blender.
Start with why people do what they do, recognize it isn’t the same answer for everyone while knowing groups of similar people exist. Build a test aimed for a specific group but also include a group you don’t think it will work with to more fully establish cause and effect.
The random nth should die a quick death. It won’t. But it should.