Identifying usability problems through a heuristic evaluation is the first step towards eliminating problems and improving the interface. Once this step has been taken, severity ratings should be fabricated for each problem. Ranking of usability problems by severity helps to determine those that should be addressed, given that not all problems can be fixed due to constraints on the design life cycle (e.g. budget, schedule, etc.,). The ratings also help in the allocation of resources for addressing the user interface problems [5].
Before a usability problem can be rated according to severity, a definition of severity must be understood. According to Nielsen [13], severity is considered to be a combination of three factors: frequency, impact, and persistence. Frequency ranges from common problems to rare ones. Impact defines the ease or difficulty with which a user can overcome a problem. Finally, persistence varies from the one-time problem that can be overcome to the problem that continuously repeats itself becoming tiresome to the user. Naturally, the severity of a usability problem increases as the level of these factors increase. To facilitate the severity rating process, these three factors are combined into one single severity rating which is an overall assessment of each usability problem.
As previously mentioned, Nielsen [13] states that evaluators have difficulty formulating severity ratings for each usability problem during the evaluation process because they are more focused on finding new usability problems. Another disadvantage of producing severity ratings during the evaluation process is that each evaluator will not find all the usability problems in the system. Therefore, the severity ratings will only reflect those problems found by the evaluator and will be incomplete. To solve this dilemma, severity ratings for all the usability problems can be found by sending a questionnaire to each inspector once the evaluations process has been completed [13].
There is concern that there may be some bias on the part of the evaluators performing the severity ratings. It might be expected that evaluators will rank the problems they found as more serious. However, Nielsen [13] found that any given evaluator's severity rating of a usability problem was essentially independent of whether that evaluator had found that problem. There was a positive correlation between the evaluators' ratings and the number of evaluators having found each problem [13]. This correlation is not to bias in the severity assessment since individual evaluators do not know how many other evaluators had found each problem. This correlation is due to the fact that the more severe usability problems are found more frequently by heuristic evaluations [13].
Again, the reliability of the severity ratings depends upon the number of evaluators used with the method. Ratings from three to four evaluators would seem to be satisfactory for many practical purposes. Severity ratings from a single evaluator are naturally considered bias and too unreliable to be trusted [13]. But one must remember that given certain circumstances, one evaluator may be all the resources available to perform a heuristic evaluation.