NumberEval

Number of Evaluators

In general, it is difficult for a single evaluator to perform a heuristic evaluation since one person will never find all the usability problems in an interface [13]. In fact, experiments from several projects indicate that any single evaluator will miss most of the usability problems. For example, the results from six case studies [12, 13] show that the single evaluators only found approximately:

35% of usability problems;
42% of major usability problems; and,
32% of minor usability problems.

Note: Major usability problems are defined at those that have serious potential for confusing users or causing them to use the system erroneously while minor problems may slow down the interaction or inconvenience users unnecessarily [5]. (refer to severity ratings)

The above results will vary according to the difficulty of the interface being evaluated and the expertise of the inspectors. But in all cases, poor results are achieved when relying on a single evaluator. Some think this performance by a single evaluator may not be acceptable for the use of a heuristic evaluation in a usability engineering project [11]. However, these results are not all that bad. Even finding some problems is much better than finding no problems, especially given a design schedule that only allows a heuristic evaluation to be performed by a single inspector. The heuristic evaluation can be supplemented with other usability engineering methods to increase the total number of problems found (refer to "Heuristic evaluation and other techniques").

It is possible to improve the effectiveness of heuristic evaluations significantly by involving multiple evaluators. Experience from many different projects has shown that different people find different usability problems [12,13]. Also, studies have shown that there is a substantial amount of non-overlap between the sets of usability problems found by different evaluators [13]. A general recommendation is to use between 3 and 5 evaluators [12,13]. In general, it is expected that five evaluators will find from two-thirds to three-quarters of the usability problems in an interface [10]. This is a considerable improvement when compared to the results from single evaluators.

It must be realized that this number of evaluators is not a reliable magic number and depends on many factors such as budget, schedule and resources. To help determine the optimal number of evaluators, a cost benefit ratio of a heuristic evaluation can be formulated. First is to determine the cost of using the method, considering both fixed and variable costs [13]:

Fixed costs: paid no matter how many evaluators are used. Includes time to plan the evaluation, get the materials ready, and write up the report .
Variable costs: the costs each time an additional evaluator is used. Includes evaluators' salaries, cost of analyzing evaluators' reports and resources used.

Naturally, these costs will vary depending on each company's cost structure and the complexity of the interface being evaluated.

Benefits are mainly due to the finding of usability problems and the summed severity scores of these core problems that were found. Educational benefits may also be realized. Evaluators increase their understanding of usability by comparing their own evaluations with others.

Usability problems described in 11 published projects were analyzed and it was concluded that the maximum cost/benefit ratio for a medium/large project would be to perform a heuristic evaluation with 4 evaluators [1]. This helps to support heuristic evaluations as a discount usability engineering method that can be applied on a modest budget.

Nielsen [14] compares the individual differences between evaluators performing heuristic evaluations. It was discovered that most evaluators find the average amount of usability problems, a few do very well, and a few do rather poorly. The difference in the performance of individual evaluators is larger as the interface becomes more difficult to evaluate.

Ultimately, the optimum number of evaluators to use will depend upon the compromise made between the increased cost of using each extra evaluator versus the cost of not identifying usability problems.

Return to Home Page