Validate Website Changes Without A/B Testing

Theo van der Zee

17 February 2023

A/B testing is often seen as the holy grail of conversion optimization. In fact, many use the terms 'CRO' and 'A/B testing' almost as synonyms of each other. Unfortunately, conducting A/B testing is not a realistic option for every organization.

A/B testing is comparing two or more versions of a website. This measures how often visitors take a desired action (also called 'conversion') in each of the different versions of the website. This way you can investigate which version of the website converts better and make that version of the website live. But why is this seen as such a powerful method within the online marketing field?

Pyramid of evidence

The pyramid of evidence shows the relative strength of findings obtained from research. Over the years, many variants of such a pyramid have been produced in both science and practice. However, most of them (also) focus on research methods that are less common for conversion optimization, such as case-controlled studies, cohort studies, and cross-sectional studies. For that reason, a pyramid of evidence applied to the context of conversion optimization is shown above.

Five groups of research methods are shown in this 'conversion pyramid'. The sides also show the level of quality of evidence they provide for each of these research methods and the risk of bias. Bias refers to unconscious influence on the research results, which means that they do not accurately reflect reality. Consider, for example, (unconscious) prejudices of the researchers that ensure that an above-average amount of attention is paid to a particular component.

Meta analysis: These are analyzes of large amounts of A/B tests. It is examined whether insights can be gained by grouping the results of experiments carried out on certain themes or pages. This form of analysis is only possible when tests are systematically set up and documented.
A/B testing: In the scientific context, these are usually called randomized controlled trials. This technique explicitly involves a random distribution of visitors across different variants of the website.
Data analysis: For example, we look at web statistics data or data present in the backend. Quasi-experiments are also placed under this group. That topic is discussed in more detail later in the article.
Observation study: The most well-known example below in the context of conversion optimization is user testing. The behavior of testers on the website is observed and analyzed.
Expert opinion: This includes various techniques such as an expert review, competitor analysis, and desk research.

As you can see from the pyramid, A/B testing provides high quality evidence with a low risk of bias. For that reason, this method is seen as an important part of the conversion optimization process. This ensures that websites and web shops that want to increase their conversion ratio almost always want to use A/B testing as part of this process. But how do you deal with it when this is not possible?

A/B testing is not possible

In order to perform statistically reliable and valid A/B testing, there are a number of conditions that your organization and website must meet. For example, you must have sufficient conversions per month and as an organization you must be prepared to spend resources on setting up good tests. If these conditions are not met, testing will probably not be done properly and will therefore become a costly activity.

Too few conversions

As a rule of thumb, you need at least 1,000 conversions (purchases, leads, donations) per month to set up a good A/B testing program. Websites or web shops that receive significantly fewer than 1,000 conversions per month therefore have too few conversions. For example, you often see such lower number of conversions at B2B websites, startups, or companies that operate in a niche. It is also possible that a company wants to research something for a specific segment, on a less visited landing page, in a less viewed position at the bottom of a page, or on an element with few interactions.

Too few resources

In addition, it is possible that there are enough conversions per month on the website or webshop, but there is a lack of sufficient resources. For example, it is possible that management does not release sufficient budget to engage in A/B testing. There may also be a lack of suitable tools or employees. In addition, there is of course also some confidence in the added value of A/B testing and sufficient priority must be given to it by the organization. If too many of these resources are lacking, good A/B testing may not be an option.

Free and non-binding 1 hour session?

Gain insight into your challenges surrounding CRO

Contact me

Methods for CRO research

Finding out where opportunities for website optimization lie can be done with a wide set of methods. Some of those methods are qualitative while others are quantitative. However, each has its own unique advantages and disadvantages. Sound CRO advice can therefore help to determine which of these methods may be most suitable.

Heatmaps: There are tools that store which elements a visitor clicks or taps. This way you can gain insight into which elements are often used by visitors. You can also learn which elements visitors click on, but which are not clickable. Sometimes these are, for example, elements that graphically look like a button on the website, but are not a button.
Scrollmaps: It is also possible to see how far visitors scroll down on a page on average. As a result, you may discover, for example, that an important element that is placed relatively low on the page is only seen by a small percentage of visitors. Obviously it is important to divide this for desktop, tablet, and mobile traffic.
Session replays: These provide a moving image of how a visitor moves around the website. By tracking mouse movements, scrolls, and clicks, a session replay tool can let you look over a visitor's shoulder, as it were. This can provide valuable insights into how the flow of a website or, for example, the checkout of an online store is structured.
Interviews: This allows you to delve deeper into the 'why' of visitors or customers. For example, you can ask them why they chose you and not a competitor, or why they chose certain products and not others.
Customer service: My experience is that many companies miss opportunities here. The support employees talk every day with people who encounter all kinds of problems on the website. By including the knowledge of the support employees in the optimization process, valuable insights can be used that might otherwise not have been discussed.
Statistics: These provide insight into the 'what' of visitor behavior. Namely which pages visitors view, on which devices, with which browsers, on which screen sizes, etc. Commonly used tools for this are Google Analytics, Snowplow, Matomo, and Piwik.
User tests: This often involves working with panels. Testers from the panel are then asked to carry out assignments on the website and to say out loud what their thoughts are.
Focus groups: These are groups of people (usually from the target group) who are asked questions about the company. For example, these questions can be about the product range or the unique value proposition, but of course also about certain parts of the website.
Surveys: A quantitative method to gain more insight into the thoughts of visitors or customers. In some cases, the results of surveys can also be used to share on social media, for example.
Academic research: Insights from science are often well validated and can therefore be very valuable. Although they are not specifically applied to a particular company, a broader insight can help determine how a particular issue can be approached.
Product reviews: Many web shops choose to let visitors post product reviews for the products. These can sometimes contain valuable information about opportunities to further optimize the webshop.
Social media: It is also important to keep a close eye on social media channels. People sometimes post criticism about companies on their social media profiles. That information can sometimes be used to improve the website.
Internal feedback: Employees work on the company's website every day. This ensures that they sometimes have unique feedback about optimization opportunities. Make sure that there is a suitable platform on which they can share this information with the product owners.

Alternatives to A/B testing

To get straight to the point: none of the methods below are a direct replacement for A/B testing. Just as A/B testing has unique advantages and disadvantages, so do the methods below. However, these methods can be a tool for companies that cannot A/B test, but want to be as thorough as possible with CRO.

Triangulation

By using triangulation you combine different research methods to reduce the weaknesses and bias of each individual method. Triangulation is a research strategy that can help increase the validity and reliability of findings. This technique is mainly used in qualitative research, but also in quantitative research. In so-called mixed methods research (a combination of qualitative and quantitative) you always use methodological triangulation.

Norman Denzing has identified four types of triangulation in the book 'Sociological Methods: A Sourcebook':

Data triangulation: Using input from different times, places, and people.
Researcher triangulation: Involving multiple researchers.
Theory triangulation: The use of different theoretical perspectives.
Methodological triangulation: The use of different research methods.

Examples of combinations

When triangulation is applied to research for conversion optimization purposes, you can use various combinations of the methods described above. For example, think of:

Heatmaps + Session replays: Heatmaps provide a visually clear overview of where on a page all visitors click. Session replays, on the other hand, provide a picture of where an individual visitor clicks. For example, if you see a visitor click on something that is not clickable in a session replay, you can use a heat map to see if that is something that happens often, or a unique problem for that visitor. Conversely, you can see in a heat map where many clicks occur, and then use a session replay to see how visitors arrived at that click. This is an example of a combination of qualitative and quantitative research.
Customer service + User testing: Suppose customer service regularly receives reports that visitors have difficulty using the filters on a category page. You can then set up a user test that largely takes place on exactly those pages and has testers repeatedly carry out assignments that require them to use the filters. This way you can see live how people interact with the filters and what their thoughts are about them. It is possible that the testers also provide ideas about solutions for how the filters could be improved.
Academic research + Internal feedback: You use what theory says plus what you see in practice. The strength of academic research is that it is often well substantiated, but the weakness is that it has not been applied to the exact situation. Internal feedback, on the other hand, is often unsubstantiated, but very well applied to the company. In this way, the weaknesses of one method are partly offset by the strengths of the other, which will theoretically lead to more reliable insights.
Expert review + Interviews: In an expert review, it can be difficult to sufficiently empathize with the target group for some points of interest. If, after conducting an expert review, you are able to conduct a few interviews with the target group, you can easily ask further questions about those points and find possible solutions for the points of interest.

Recurring user testing

Perform user testing on a regular basis to continuously collect feedback on the development of the website.

Many organizations that use user testing do so on a sporadic basis. This almost always involves less than one round of testing per year. However, if you choose to significantly increase the frequency of user testing, you can use this method to keep a finger on the pulse. If, in addition to an increased frequency, you also increase the number of testers, you can even start thinking about quantitative user testing (also called benchmarked). NN Group describes some of the advantages this brings, such as that usability is quantified and that you can compare designs.

In practice, you can make continuous user testing as extensive and expensive as you want. For example, one user test per quarter with five testers allows you to gain UX insights on a recurring basis. If you increase this to twenty testers at a time, you can already observe trends in quantitative assessments of components. Then increase the number of tests to one per month and the method becomes useful, for example, to test planned functionalities before they are built, etc.

Some advantages of user testing over A/B testing are that this method does not require scripts to be placed on the website and no technical knowledge is required to set up user tests.

Quasi-experiments

There is also the possibility of conducting quasi-experiments. These are experiments in which visitors are not randomly divided between the control group and variants.

Instead of a random distribution among the variants, as happens in an A/B test, external circumstances are often used. A commonly used 'divider' in this case is 'time'. The article The Experimentation Gap states that quasi-experiments often represent up to 10-30% of the total amount of experiments at companies that have invested in infrastructure to enable this type of testing.

As described on Vista's blog, this party regularly uses quasi-experiments. This concerns, for example, so-called pre/post tests, where the situation before and after a certain time is compared. They indicate that they have conducted more than 200 such tests in the past year. In addition to pre/post tests, more extensive forms such as switchback, crossover, and time-discontinuity designs are also possible as quasi-experiments. On the surface, quasi-experiments are an intuitive replacement for A/B testing. However, because external factors are not automatically filtered out due to the design, reliable and valid analysis of quasi-experiments can be challenging.

Conclusion

Validating website optimization ideas on a website can be challenging. Large companies often use A/B tests for this, but for smaller organizations this is sometimes not possible for various reasons. This article therefore discusses some options for gaining better insights and (partially) validating planned changes.

Theo van der Zee (MSc, Psychology) has been building and optimizing websites and web shops for more than 20 years. As a freelance conversion specialist, he helps companies to improve their websites based on research and experiments.