A/B Testing | Best Practices
Christian Sokolowski avatar
Written by Christian Sokolowski
Updated over a week ago

A/B testing, also known as split testing, is a scientific method used to determine the optimal option in a given situation. In general terms, A/B testing involves testing two or more versions of the same page to identify which one performs better with customers, leading to increased sales or conversions.


BEST PRACTICES

Run Tests for Sufficient Time

It is recommended to run your A/B test for a minimum of 1-2 weeks to account for daily variations and to obtain consistent results across multiple weeks. Additionally, it is important to achieve statistical significance, which will be discussed further.

While 1-2 weeks is the minimum duration, you may also consider running your test for 1-2 business cycles to ensure comprehensive testing among all customer types.


Ensure You Achieve Statistical Significance

Statistical significance is crucial because it ensures that the results of a test are reliable and not simply due to chance. It requires a sufficient number of visitors to see all variations of the test. There are a couple of ways to determine if a test has reached statistical significance. One option is to manually calculate it using mathematical equations, or you can utilize statistical significance calculators.

There are online tools available, such as the sample size calculator [https://www.evanmiller.org/ab-testing/sample-size.html], which helps determine the sample size needed for each variation in the test. Another option is the significance calculator [https://neilpatel.com/ab-testing-calculator/], which calculates significance based on test performance numbers and allows for multiple variations.

Most calculators will indicate the statistical significance. In case you use one that doesn't or if you prefer manual calculations, you should look for a p-value of 0.05 or 5%. This indicates a 95% confidence interval, meaning that you can expect to replicate the results 95% of the time. This ensures unbiased and reliable results.


Variations: How many are too many?

While Rebuy's A/B Testing feature allows for up to 10 total variations (1 control and 9 variations), it's important to note that using all 10 variations may not be suitable for every test. The amount of traffic your website receives and the duration of the test are crucial factors in determining the appropriate number of variations.

Statistical significance is dependent on the number of visitors who see each test variation. If your site has low traffic, it might be challenging to achieve statistical significance with a large number of variations. In such cases, it may be more sensible to use fewer variations to ensure reliable results. On the other hand, if your site has high traffic but you plan to run the test for a short duration, it is advisable to opt for fewer test variations to reach statistical significance.

As a starting point, it is generally recommended to begin with one variation and gradually increase the complexity of the test as you gather more data and insights.


When to Run Tests

Running tests during seasonal events can indeed lead to inaccurate results. Seasonally high traffic, such as during the holiday season, can skew the test outcomes. If you use those results to make decisions in the following year's first quarter when the traffic returns to normal levels, you may find that the winning variation during the high-traffic period actually performs poorly under average traffic conditions.

To protect against these negative impacts, it is advisable to consider the following strategies:

  1. Avoid running tests during seasonal events: Plan your tests outside of periods with significant seasonal fluctuations to ensure that the results are not influenced by temporary spikes in traffic.

  2. Run tests in full week increments: Conduct tests for complete weeks rather than shorter timeframes. This approach helps to capture a broader representation of traffic patterns and minimize the impact of daily or weekly variations.

  3. Include a variety of traffic types: Ensure that your test includes a mix of traffic types, including both paid and organic. This helps to account for different sources of traffic and reduces the risk of biased results that may arise from relying solely on one traffic source.

By following these practices, you can minimize the impact of seasonal events and obtain more reliable and accurate results from your A/B tests.


THINGS TO AVOID

Testing the impact of changes on your website or application is crucial for optimizing user experience. When conducting A/B tests (or A/B/n tests), it is important to focus on testing one variable at a time. But what exactly is a variable? A variable refers to the X Y Z.

While multivariate testing allows you to evaluate multiple variables simultaneously, it requires extensive research and sustained traffic to yield reliable results. Therefore, we strongly advise against engaging in multivariate testing without thorough preparation and adequate traffic volume.

To ensure the integrity of your A/B test data, be cautious of the following factors that can skew the results:

  1. Modifying your theme or the page being tested: Even minor changes like adding a banner can introduce unusable data, as they alter the presentation to customers.

  2. Conducting multiple tests on the same page: This practice resembles a form of multivariate testing, which introduces complexity and necessitates a significant volume of traffic to obtain trustworthy and actionable data.

By avoiding these pitfalls, you can maintain the accuracy and reliability of your A/B test results, enabling informed decision-making and optimal user experiences.

Measure By One Metric

Rebuy A/B Testing focuses on one metric (Experiment metric) and selects the winner based on this metric for solid reason. Using two or more metrics skews the hypothesis, muddies the results, and increases the chances of seeing a false positive. Having a singular metric to measure your experiment ensures you stay focused on what matters. It's good to monitor other key metrics to ensure you don't tank them, but should not be considered when making decisions.

Correlation is Not Causation

Correlation and causation are very easy to conflate. But why does it matter anyway? We'll spare you the technical explanation, but simply put correlation is merely the connection or relationship between two or more things, whereas causation literally means to cause something.


USE CASES

Test Widget Data Sources


By testing the data source powering a widget using Rebuy's A/B Testing, you can explore various aspects and elements that can be personalized or modified. Testing the data source provides an opportunity to experiment with:

  1. Personalization: You can increase the level of personalization on the widget by customizing it based on user preferences, demographics, or past behavior. This can involve displaying personalized recommendations, offers, or content tailored to individual visitors.

  2. Language customization: You can test different language variations displayed on the widget to cater to the preferences of diverse customer segments. This can help optimize the messaging and communication for specific target audiences.

  3. URL-based customization: You can tailor the widget's content or presentation based on the URL from which visitors arrived. This allows you to deliver a more targeted experience based on the referring source, such as specific campaigns, landing pages, or search queries.

  4. Product variations: Testing different data sources enables you to showcase different products within the widget. This can involve highlighting specific product categories, featured items, or personalized product recommendations based on user preferences or browsing history.

Through these data source tests, you can make iterative improvements to the widget's performance, engagement, and conversion rates, while minimizing the risk to revenue by ensuring that changes are tested and validated before full implementation.


Test Different Widgets

If you have a hypothesis that a different widget type could potentially outperform your current one, but you're hesitant to risk losing revenue from the existing widget, Rebuy's A/B Testing feature can help address this concern. With Rebuy's A/B Testing, you can easily test a different widget type alongside your existing one, allowing you to gather data and insights without jeopardizing the revenue generated by your current widget.

By setting up an A/B test with the new widget type, you can compare its performance directly against the existing widget. This allows you to measure key metrics, such as engagement, conversion rates, and revenue, and make data-driven decisions about whether the new widget type is indeed more successful.

Rebuy's A/B Testing provides a controlled environment for testing different widget types, enabling you to explore new possibilities while mitigating risks associated with revenue loss.

Did this answer your question?