Heuristic evaluation and other evaluation techniques

Comparing evaluation techniques

It is natural to ask how heuristic evaluations rank with other usability evaluation techniques, both inspection and empirical.  Several studies have compared different methods with the attempts to determine if one is better than another.

A study by Desurvire et al. (as described in [2]) compared the effectiveness of empirical usability testing and heuristic evaluations in identifying violations of usability guidelines.  Laboratory testing found 6 of the 10 guidelines were violated, whereas heuristic evaluation identified only one violation.

Desurvire et al. (as described in [2]) compared the results of three different types of evaluations to laboratory testing results.  The heuristic evaluation results predicted the laboratory testing results better than the cognitive walkthrough method.  Differences were due to the performance of the human factors expert who served as evaluators.

Jefferies et al. [4] compared four different techniques used to evaluate a user interface for a software product prior to its release by four groups.  The four methods were heuristic evaluation, software guidelines, cognitive walkthroughs, and usability testing.  Overall, heuristic evaluations produced the best results.  This method identified the most usability problems, reporting one-third of the most severe problems and two-thirds of the least severe. These serious problems found by several UI specialists required the least amount of effort; therefore, heuristic evaluations had a distinct cost/benefit advantage.  The amount of overlap between the four methods in terms of usability problems identified was approximately 10 to 15 percent between any two methods.

Karat et al. [5] did a similar study comparing empirical testing with individual and team walkthroughs using heuristic evaluations.  Walkthroughs in this study were essentially heuristic evaluations since each evaluator utilizes 12 usability guidelines (8 of which come from Nielsen's original set of heuristics) to find usability problems.  Contrary to the aforementioned study, empirical testing condition identified the largest number of problems and identified a significant number of relatively severe problems that were missed by the walkthrough conditions.  Also, empirical testing was considered more cost-effective since it required the same or less time to identify each problem when compared to walkthroughs.  About a third of the significant usability problems identified were common across all three methods.

Differences between the two latter studies can be speculated. The fact that user testing was more cost-effective in one study and the opposite in the other case may be due to the differences in walkthrough procedures and the type of data analysis. The differences in evaluator expertise or the length of time to perform the heuristic evaluations could have caused the variation in which method found the most usability problems in the user interface.  The higher degree of overlap in the Karat experiment may be due to the fundamental differences in the two sets of scenarios used in the two studies.

As can be seen, numerous factors affect the relative success of any individual method. This makes it very difficult to compare different evaluation methods.  The best approach is to remember that each method has its strengths and weaknesses depending on its application.

Heuristic evaluations and user testing

As a discount usability engineering method, heuristic evaluations do not guarantee to find every last usability problem in an interface [13].  For example, if the system is highly domain-dependent and the evaluators have little domain expertise, usability problems will likely be overlooked.  Also, since evaluators have no knowledge of the actual users and their tasks usability problems will be missed [13].  To help compensate the shortcomings of any one method, various usability methods should be combined for each project.  The combination will vary slightly depending on the exact characteristics of the system in question.  It is often recommended using heuristic evaluation with a form of user testing.  Typically, one would first perform a heuristic evaluation to clean up the interface and remove as many usability problems as possible.  After a redesign, the interface would be subjected to user testing both to check the outcome of the iterative design step and to find any remaining usability problems that were not picked up by the heuristic evaluation. Two main advantages for alternating between heuristic evaluation and user testing are [13]:

  1. a heuristic evaluation can eliminate a number of usability problems without the need to "waste users", who can be difficult to find and schedule in large numbers; and
  2. these two categories of usability evaluation methods have shown to find fairly distinct sets of usability problems.

Heuristic evaluations should be used with empirical user testing, not as a replacement for user testing.

Return to Home Page