What is A/B Testing?
A/B testing is a commonly used methodology in eCommerce to test new features or products. This is a process of making data-driven decisions for user interface, marketing, and overall products. The main process is to split the users into the control group and experiment group. Then allot the existing products or features to the control group and the new product to the experiment group. Observations are recorded on how the control group and experiment group responded and decisions are made based on their behavior or response about which version is better.
The most important part is,
all the elements should be constant except for one.
When results are reliable and repeatable, we can make the right decision. This is a high-level introduction to how A/B testing works.
More than two groups can also be used. In that case, it is called the A/B/N test.
A/B tests can also be called randomized controlled experiments, controlled experiments, split tests.
What Can be Tested?
This is an important question. This is a very important decision to make. What to test? A lot of things can be tested. For example, you can run an A/B test on many things such as feature image, headline, subheadline, formating, layout, writing style, button color, button positioning, algorithm performance, and much more on your website. Here is a popular example. Google ran an experiment on how different shades of blue affects user engagement. They showed different shades of blue to the different groups to understand which shades of blue get more clicks.
There are some invisible changes as well. Amazon performed an experiment on loading time and found out that each 100ms increase in loading time decreases their sales by 1%. Here loading time is an invisible change.
Five steps of the A/B test
These are the five steps of an A/B test:
- Experiment Design
- Running Experiment
- Result to Decision
- Post Launch Monitoring
a. Defining the key metrics: This is an Overall Evaluation Criterion (OEC). It should be measured practically. In the Google example above, if we want to test if the shades of blue changes affect the number of clicks, then the evaluation metric is the number of clicks.
b. The changes should be easy to make: It is easy to change the color of the button or add a new feature. But if the variant is a totally new design of a website, that will be a time-consuming and expensive change. So, the variant should be a reasonable element.
c. Randomization Unit: Randomization is definitely an important parameter in the A/B testing process. For example, we are launching a new math course. We are testing this course on two groups. We are taking the math score of top-class private school students as an experimental group and the math score of a group of students from a not so well rated school as a control group. Do you think the result will be reliable?
We need to take the students randomly so that both control and experiment groups are representative of the overall population. That’s how we can get reliable results.
The randomization is:
“‘Who’ or ‘What’ that is randomly allocated to each group”.
The larger the number of randomization units, it is able to detect the smaller effects. The most commonly used randomization unit is the user. There are thousands of randomization units.
In this stage, experimental parameters need to be figured out. Four key parameters are:
a. What is the percentage or number of population to use for the experiment.
b. Estimating the size of the sample.
c. How long to run an experiment.
d. How significant do the results need to be
Running the Experiments
Now we are on the main experiment part.
First thing is to collect the data with care. This is the most important part. The data should represent your current situation. Otherwise, the whole test will be meaningless.
Then figure out the right A/B testing tool for your experiment. You need to know what kind of tests to run. It could be confidence interval, t-test, chi-square test, or any other test.
Test both variations simultaneously. Timing is very important in A/B tests. A lot of times the day of the week or month, time of the day has an impact on a marketing campaign or user’s behavior. Let’s consider the shades of the blue experiment of Google I mentioned earlier. We need to show the different shades of blue to the different groups at the same time. If we show one shade of blue to one group of users now and another shade of blue to a different group a month later, the experiment results are not valid.
Allow enough time to produce useful results. Say, you are running a t-test every day in your experiment and after 7 days you got a p-value of 0.02 (where the significance level is assumed to be 0.05), and you stop your test. Is that test is good enough? No, it is not. Because you got one good result and stopped the test. That one good result might be accidental, by a mistake. You need to make sure that your results are reproducible or repeatable.
Take feedback from the users. All the above processes talk about the quantitative part of the test. But it is a good idea to get qualitative feedback from the users. A survey or a poll may help in that case. Through an exit poll from your website taking users’ opinions on why they filled out a form, why they did not click a certain button, or why they click on a certain button or anything else you care about. This kind of poll will let you interact with your audience directly and get useful feedback.
This can be a tricky part. A lot of time you will face a tradeoff between two metrics. For example, you may find out that user engagement goes up and revenue goes down. Now, which one to choose. That will depend on your company goal.
Another important parameter to think about while making a decision is the cost of launching the change. If the cost is feasible enough.
If the cost is high, benefits should outweigh the cost.
Say, if Google would see that any of the shades of blue are not a good option, what next? This can happen when you conducted a test on 2 variations and find out that both are not satisfactory. In that case, you just found out that these are not good options for you. You have to think of something different. You can design a new test based on your learnings from this test.
Post Launch Monitoring
You have done the test, figured out what is good for your eCommerce. Are you done? You are done in a way. But it is also important to keep monitoring after launching a change. It is important to keep collecting quality data on the effects of change. The long-term effect can be different from the short-term effect. After you collect 10 years of data and analyze them, you may find different results. Also, it may come out to be a great help for your next A/B testing design.
A/B testing can be simple and very complicated. I just wanted to lay out a simplified overall layout of an A/B testing process. In real life, there might be a lot of juggling between this or that, lots of complex decision-making. But to start with this might be a great resource to learn about the A/B testing process.
Please feel free to follow me onTwitter, the Facebook page, and check out my new YouTube. channel.
#abtesting #DataScience #DataAnalytics