by Chris Low

A comprehensive guide to A/B testing

Opinion
Jul 06, 2017
App TestingIT Leadership

How showcasing two or more different versions of product to different users with A/B testing.

Making decision about what features should go into a product is a difficult task. Often there is a huge difference between what you think the users want and what they actually want. Sometimes the users can be vocal about their needs but more often than not, they just start looking for other alternatives. It is therefore, the service provider’s duty to make their product pull the users towards itself, rather than the users pushing themselves to adopt a product. This involves making changes to the existing product in ways that increase conversion. But those changes also risk alienating the existing user base! It finally comes down to making a decision about whether the trade-off is worth it, and A/B testing helps in that.

What is A/B testing?

A/B testing is the strategy to showcase two (or more) different versions of product to different users. User A sees the product version A, and user B sees the product version B. When done on a sufficiently large user base, metrics generated from A/B testing can be used to determine which version turned out better and by what margin. A/B testing can be used by developers to test new layouts, experimental features and removal of features that are no longer suitable for the application. Marketing teams can use it to compare increase in number of clicks across email campaigns and landing pages. Designers can use it to compare processes in the conversion like user onboarding, engagement, and points of bottleneck where the users sometimes to decide to not sign up for the product. A/B testing is often used for evaluating changes that are not visible to the users directly, like a new backend algorithm, or port to a new server zone, or a new CDN provider. There is a lot to be derived from A/B testing for all departments in an organization.

What are the processes that comprise A/B testing?

Identifying the required change

First is the fairly straightforward process of knowing what to test. This requirement can come from several different teams/departments.

Identifying the test target

The team responsible for carrying out the A/B test must also take into account if they need to position the changes towards a particular kind of audience. This might be based on the user lifetime (new users vs existing users), user activity (casual users vs  power users), user demographics, or any other relevant metric.

Forming the null and alternative hypotheses

This is the first point where the data science team comes into play. Making new changes is pointless unless you know what metrics you want to generate from A/B testing. The metrics you want to track are often not the ones directly related to the changes you are making. The data science team decide what independent features need to be tracked. Once the features are finalized, two rival hypotheses are proposed: the null hypothesis and alternative hypothesis. An example of a null hypothesis is whether the placement of sign up button affects the number of sign ups significantly.

Deploying the A and B versions

This is the part where the development and devops teams collaborate to make changes available to the test targets. There are several tools out there that help in A/B testing like VWO by Wingify and Firebase Remote Config by Google, but many organizations prefer to have an in-house version that is tied into their existing deployment and data collection pipelines.

Metrics collection and testing the hypotheses

Once a significant amount of data is available from the previous process, the data science team can test the two hypotheses. These hypotheses are then accepted or rejected based on the statistical significance of the variations in features discovered from A/B testing.

Taking decisions

This one sounds the easiest but is the hardest part. A/B testing is not always reliable because of so many varying parameters. Several points should be considered before relying on making decisions based on the tests.

What are some pitfalls in A/B testing?

More features do not always translate to better a product

Many product managers have the tendency to ship frequent new feature additions, disregarding the fact that more features need not be what the product currently needs. Too many features increase the level of complexity in the application, often pushing away new users right at the beginning. At that point many features have to pushed into obscure corners of the product making them practically undiscoverable for casual users. More features also mean more work for the dev team because now they need to maintain a lot more code base. Adding new features just to be taken away at sometime later also irks many users who became accustomed to those features. A/B testing can be used smartly to prevent this.

Making too many changes at once

A/B testing should ideally be A vs. B testing only. Too many changes all at once can affect each other, even if the seem to be independent.

Add an option to opt-out

Do not force any user to use changes in the product if they do not like it from the onset. Many companies have started allowing users to opt-out of A/B testing. Some even go as far as including an experimental or labs section in the product where users can enable new beta features themselves, for example Gmail.

A/B testing can be an excellent strategy for gauging changes without causing any harm to the existing user base. As the product becomes mature with time, the user base solidifies and at that point making new changes should generally be avoided before having proof that those changes are actually for the good of majority of the users.