The Extent of External and Internal Validity
for the Cognitive Walkthrough
Measures of internal and external validity are important
in any evaluation method. External validity is an index of the extent
to which the findings can be generalized to the real world, while internal
validity is an indication of how well the method is evaluating that which
it intends to evaluate. This can be conceptualized as whether the
problems identified are adequate predictors of real user problems.
The cognitive walkthrough (CW) possesses external validity to the extent
that another system resembles the one that is inspected in terms of complexity
and tasks analyzed. Since this similarity rarely occurs, it seems
like the CW has external validity as only applied to the evaluated system
in the specified context and with the users that the CW identified.
The internal validity of the CW is difficult to ascertain because it is
hard to know if the CW is real testing what it is supposed to. Although
the CW is supposed to be evaluating the task sequences, if these sequences
are inappropriate or unsuitable then the results of the CW will not be
valid. Also, by having the inspection team identify and describe
the user population, the CW also lends itself to internal validity threats.
What if the users are not described properly and the CW is conducted based
on this information? In this case, the results would be based on
an inaccurate user profile and the problems identified may not be representative
of what is really inherent in the system. Unfortunately, these validity
problems stem from the CW process itself and are therefore difficult to
change. Presumably, having a team of inspectors provides for consensus
checks on the task sequences and user profiles and this may help to reduce
validity errors.
Jeffries et al. (1991) found that in comparing four
usability inspection methods, the heuristic evaluation was found to identify
problem reports that appeared to be better predictors of end user problems
(discovered in laboratory testing) than either the CW or guideline based
inspections. This indicates that the heuristic evaluation may be
more internally valid than the CW or guideline based inspections.
Therefore, it would appear that the CW has several validity problems associated
with it, and as a result the CW should be used in situations that truly
warrant its use.
Return to Main Page