Guidelines for Consumer Testing - guidance from ESN members

In this series ESN members give their solutions to the most frequently asked questions from product developers and marketers.


Better than the competitor?

The question as to whether consumers like a given product less, equally, or more than a similar product (e.g. the competitor's offer) occurs when a new product is developed for a market where similar products are already established, or when an existing product is to be reformulated or optimized, and has to be compared to its original version.


"This comparative question can be explored in two ways", says Dr. Eliza Kostyra, sensory analyst and head of the sensory laboratory at the Faculty of Human Nutrition and Consumer Sciences, WULS, Poland.

  • by direct pair comparison of the products in a paired preference test or
  • by indirect comparison of the ratings that both products achieve independently from each other in a hedonic test which measures liking or acceptability.

“However,” Kostyra stresses, “each method has its own special features.”

“When two samples are compared, the simpler of these two possibilities is the paired preference test. Such a test should be performed with a group size of about 80-100 consumers representing the target population. Each participant gets two coded samples, A (the test product) and B (the competitive sample). The pairs are presented randomly in AB or BA order, and the test person is asked to indicate which one of the pair she/he prefers. Summarized individual results are compared with the statistical table for pair comparison (two-tailed test) to check statistical significance of the hedonic difference.”

In a paired preference test, three types of results are possible:

  • Significantly more consumers prefer sample A (the test product): In this case the manufacturer may probably be satisfied with the results achieved by the test product and have no interest in further tests.
  • Significantly more consumers prefer sample B (competitive product): in this case the manufacturer would want to know how the test product could be improved and to identify the relevant sensory attributes that need to be changed. To get such information a quantitative descriptive analysis (QDA) would be the next recommended step to resolve this problem.
  • The test results are equally distributed between both samples: this suggests that there is no significant difference between the two products them. The most common interpretation of such results is “both samples are equally liked”. “However,” Dr. Eliza Kostyra stressed, “such a result could also mean that samples A and B differ (in hedonic dimension) but there are two equal sub-groups of consumers with clearly different preference patterns for either product A or B, due to qualitative and quantitative differences between the products”.


A quantitative descriptive analysis (QDA) can be performed as a next step to investigate this question further. Only when the QDA does not reveal any significant differences in the sensory profile of products A and B can it be concluded that both products are comparable. However, if some qualitative or quantitative differences in the sensory profile (aroma / texture / flavour) between products are revealed this strongly indicates that the tested consumer group is not homogenous, but segmented into two equal sub-groups with different preference patterns. For the manufacturer, this can mean that his test product might be successful only as a niche product for a certain sub-group of consumers.

Tünde Kuti and Adrienn Hegyi of Campden BRI Hungary state that, “Compared to the paired preference test, hedonic rating tests have the advantage that they can not only be used to measure the overall liking or disliking of the products to be compared; they can also be used to measure the overall liking or disliking of specific attributes such e.g. appearance, flavour, texture. However, the order in which the questions are asked will affect the quality of the test results. It is a good practice to  start with questions about the overall likes / dislikes, and than move on to the more specific questions.”

The nine-point hedonic scale is a widespread rating scale that has been used for many years to collect data about the acceptance of food and to provide a benchmark against which results can be compared. An alternative approach is to use line-scales with anchors on the ends of the scales “extremely acceptable/like” and “extremely unacceptable/dislike”. The line scale provides a continuously graded choice of alternatives, affording  the consumers a  wider range of choices. This allows a more precise comparison between test samples. The translation of detailed, labelled category scales into other languages may cause difficulties. If this is the case, the line scales can be applied successfully to prevent misinterpretation. The line scales can be easily converted into numbers by using a software program.

The participation of at least 80-100 consumers is recommended for hedonic tests but different products may have their own requirements; larger panels are needed as the variability of the tested products increases. The samples should be presented in a balanced order so that each sample appears in a given position an equal number of times.

The evaluation of the results shows whether the liking or acceptability of the test product is significantly different to the benchmark product. The attribute-specific questions show the detailed performance of the product, which helps to interpret and highlight the strengths and weaknesses compared to the benchmark sample.

With regard to the statistical analysis of the results, Tünde Kuti recommends:

“One should bear in mind that the different types of scales require different statistical approaches. When hedonic scales with category descriptions such as e.g. “like extremely” “dislike extremely” are used for the collection of consumer-liking data, it is useful to summarize the ordinal data by means of the histograms, illustrating the frequency of use of each point on the scale.  The histograms show polarization of ratings or skewed data.

The evaluation of the test against the benchmark sample can be done by using non-parametric methods and tests to discover whether there are significant differences between products. If the data for the two products are collected from the same consumers, then the paired Wilcoxon signed rank test should be used. If the test product and benchmark product are evaluated by two different groups of consumers, then Mann Whitney U-test would be appropriate.

When continuous line-scales e.g. with anchors “extremely acceptable” and “extremely unacceptable” are used for data collection, the results can be summarized by calculating summary statistic (e.g. mean, standard deviation, interquartile range). Graphical methods (histograms, bar charts) can be useful for visual representation of the results.
The results of the test as evaluated against the benchmark can be tested for significant differences. If the data show normal distribution (e.g. are not skewed or bipolar), parametric statistical methods can be used; otherwise non-parametric methods should be used. If the data for the two test products are obtained from the same assessors, then the paired t-test would be appropriate.” If the data for the two products are collected from different groups of consumers, then independent t-test should be used.

Adrienn Hegyi emphasizes that:

Under no circumstances should a trained panel be asked to evaluate the acceptability or preference of the product, as trained panel members do not behave as naive consumers.
It should be emphasized that acceptability and preference are not the same thing; a consumer may prefer product A to product B, but may find them both unacceptable.

To conclude, Eliza Kostyra summarizes the differences between the two methods as follows:

The main advantages of a paired preference test are that:

  • it is a simple task that is easy and quick to perform.
  • it sensitively discriminates differences between two compared samples, and
  • it allows for a simple statistical verification of the results, which only have to be compared with the appropriate statistical table.

She considers it a disadvantage that the results obtained by this method are of relative character and do not give information concerning the acceptability level of the compared samples.

The main advantages of hedonic rating tests ar that the obtained results deliver approximate information relating to the acceptability level of the compared samples and the different product attributes. Such tests also yield information in regards to the hedonic distance between the samples.

Disadvantages of hedonic rating tests are that:

  • they need more time to be performed,
  • the individual results are more dispersed, and
  • a verification of the statistical significance of differences in hedonic rating data needs more time.