The Extent of External and Internal Validity for the Cognitive Walkthrough
 
    Measures of internal and external validity are important in any evaluation method.  External validity is an index of the extent to which the findings can be generalized to the real world, while internal validity is an indication of how well the method is evaluating that which it intends to evaluate.  This can be conceptualized as whether the problems identified are adequate predictors of real user problems.  The cognitive walkthrough (CW) possesses external validity to the extent that another system resembles the one that is inspected in terms of complexity and tasks analyzed.  Since this similarity rarely occurs, it seems like the CW has external validity as only applied to the evaluated system in the specified context and with the users that the CW identified.  The internal validity of the CW is difficult to ascertain because it is hard to know if the CW is real testing what it is supposed to.  Although the CW is supposed to be evaluating the task sequences, if these sequences are inappropriate or unsuitable then the results of the CW will not be valid.  Also, by having the inspection team identify and describe the user population, the CW also lends itself to internal validity threats.  What if the users are not described properly and the CW is conducted based on this information?  In this case, the results would be based on an inaccurate user profile and the problems identified may not be representative of what is really inherent in the system.  Unfortunately, these validity problems stem from the CW process itself and are therefore difficult to change.  Presumably, having a team of inspectors provides for consensus checks on the task sequences and user profiles and this may help to reduce validity errors.

    Jeffries et al. (1991) found that in comparing four usability inspection methods, the heuristic evaluation was found to identify problem reports that appeared to be better predictors of end user problems (discovered in laboratory testing) than either the CW or guideline based inspections.  This indicates that the heuristic evaluation may be more internally valid than the CW or guideline based inspections.  Therefore, it would appear that the CW has several validity problems associated with it, and as a result the CW should be used in situations that truly warrant its use.

Return to Main Page