User interface consistency:
an evaluation of original
and revised interfaces
for a videodisk library

Richard Chimera
Ben Shneiderman

Abstract

Original and revised versions of the National Library of Medicine MicroAnatomy Visual Library system were evaluated with an empirical test of nineteen subjects. The versions of the program's interface differed on issues relating to consistency of wording and screen layout, use of color coding, display of status information, and availability of help information. Each subject used both versions of the program to perform matched sets of tasks. The dependent variables were time to perform tasks correctly and subjective satisfaction as reported via the QUIS questionnaire. The revised version was statistically significantly faster for five of twenty tasks and more satisfying to use on a number of dimensions. The benefits of consistency and guidelines for design of interactive computer systems are discussed.

Introduction

Interactive computer systems are common in professional environments and are becoming more widely used in library and public information settings, such as online catalog systems, local area maps, and museum exhibit information. For these systems to be used effectively by untrained users the user interface must be carefully designed. Important considerations are: the multiple dimensions of consistency, cognitively-sound structuring, close correspondence of functionality to user goals, and small sets of choices provided to users at any one time (Lewis, et al. 90; Reisner 90; Nielsen 89; Kearsley 93).

Initial informal references to consistency have turned into ambitious attempts at formal definitions that get more elusive as they are scrutinized (Reisner 90; Wiecha et al. 90). Kellogg (Kellogg 89) points out that "Consistency has no meaning on its own; it is inherently a relational concept. Therefore to merely say that an interface is consistent or that consistency is a goal of user interface design is also meaningless." The issue of defining consistency has even started a heated community debate (Grudin 92; Wiecha 92). It is now commonly accepted that when a competent user's view of the system differs from the designer's view of the system, then the system is inconsistent (Reisner 90; Grudin 89). The interface design community agrees that the user's tasks and application domain are a major focus for providing consistency. At the same time, the community acknowledges that adhering too much to physical metaphors and the status quo can limit an interface's usefulness by potentially ignoring inherent advantages of the computer medium.

There is also a widely held belief that internal consistency (e.g., layout, terminology, color, etc.) is a crucial issue in the usability of highly interactive computer programs (Shneiderman 92; Reisner 90; Nielsen 89). Nielsen states that consistency leads to "improved user productivity by leading to higher throughput and fewer errors because the user can predict what the system will do in any given situation and can rely on a few rules to govern use of the system." Further, he points out "it is desired to have the system be consistent with users' expectations whether formed by other applications or by non-computer systems." More encouragement for consistent design can be found in various guidelines documents (Brown 88; Smith & Mosier, 86).

The goal of this project was to validate empirically that modest changes to an interface to make it more consistent with respect to the users' domain and task context would increase comprehension, thereby decreasing completion times and increasing subjective satisfaction.

History

The NLM MicroAnatomy Visual Library system is an interactive computer system that allows users to view videodisk images of human cell structures. The images are accessed in a number of ways: via word search, videodisk frame number, and by prepared slideshows. It was created in 1987 by the National Library of Medicine to be used in medical schools and libraries by students and professors.

Figure 1. Main menu screen in the original interface. Menu items are verbose and use computer-oriented rather than task-oriented language.

These users are knowledgeable of medicine but not necessarily of computers.

NLM submitted the original version of this program to the Human-Computer Interaction Laboratory for an evaluation (i.e., Figures 1-3). Usability studies were performed and the results were the basis for suggestions to improve the user interface (Young & Shneiderman 89). The suggested improvements focus on internal consistency and harmony with users' application domain, expectations, and tasks. NLM revised the interface and challenged us to prove whether the changes would make a difference.

Despite the obvious utility of comparing original and revised versions of an interface to see which is faster, more comprehensible, or leads to fewer errors, this type of study is still underutilized in the human-computer interaction community. This study addressed only those aspects of the interface that were different between versions. The tasks the subjects performed were created in a goal-oriented way, and did not take advantage of specific differences in either version. For example, the task descriptions used goal-oriented language, not interface version specific language.

Figure 2. Dialog box for keyword search of images in the original interface. Screen title is not consistent with wording of menu item that brings user here. Instructions creep into center of screen and are not well organized. Description of '*' character uses computer language and is not well explained.

Improvements

Consistent use of colors: The revised interface used seven different color schemes, each one representing a particular function. The uniqueness of the function-color mapping makes it easy to locate the type of information needed by briefly glancing at the screen and focusing attention on the appropriate color. Each screen contains no more than four different colors reducing the distraction effect due to multiple colors. The original interface used an inconsistent color scheme.

Figure 3. Retrieved-images screen with a selected item in the original interface. No title at top and the jumble of function key descriptions at the bottom can each lead to confusion. Magnification and stain information is not set apart for clear identification.

Phrasing menu items for consistency: Menu items satisfy the following conditions (Figure 4):

Function key operations (which are performed simply by pressing a function key located at the top of the keyboard) are displayed along the bottom of the screen in numerical order with the format "function key label - operation" (e.g., "F1 - Help").

The original interface used computer-oriented language in some menu items (Figure 1) and was inconsistent in labeling function keys (Figure 3).

Consistent screen layout: The top section of the screen displays information relevant to the orientation of users (see Orientation below). The label "Current Record" appears at the top left corner with the record number of the currently selected record. Each screen has a unique title which is displayed at the top center. Menus appear in the center of the screen, menu selection can be made by moving the cursor vertically with arrow keys. The one-line description of each menu item appears below the border of the screen. The bottom section of the screen displays functional information. At the left is "F1 - Help" and at extreme right is "ESC - ESCape" with the other function keys in between numerically sorted (Figure 6). The active window of the screen has a double lined border while the inactive section has a single lined border.

Orientation and information display: The menu structure has no more than five levels. The menu item selected becomes the exact title for the next screen to remind users of their choice (Figures 4-5). The currently selected record number is displayed. If a list of options requires more than a page to display all the options, there is an indication of the page number at the top of the screen, as well as PgUp and PgDn references. Hitting ESCape always returns users to the previous menu so that users can easily back out of selections. Input values are echoed to the screen providing confirmation feedback.

Experimental procedures

We used a within-subject design to test whether the revised interface was more clear and comprehensible than the original interface for first-time users. This would be evidenced through faster task completion times because there were no execution

Figure 4. Main menu screen in the revised interface. Improvements include wording consistent with task domain (e.g., "print" instead of "report"), a onetime description of the highlighted menu item is always shown, and a more clear and consistent description of the ESC key.

speedups made between versions, only changes to interface organization, color, and word choice as described earlier. The presentation order of versions was counterbalanced. Three pilot subjects were used to test the experimental tasks and procedure; changes were made to decrease the number of tasks and to use more descriptive text to explain some of the tasks. The procedure for testing each subject was:

Participants

Nineteen University of Maryland staff and students were the participants. Eleven were male and eight were female. Approximately fifteen were students and four were staff. There were no qualification requirements imposed on the subjects for participating in the experiment. Some participants had computer experience, fewer had used some sort of computer catalog system, yet fewer had used a computer database system. The seven participants that were freshmen and sophomore psychology undergraduates were given two "experiment credits" that counted towards their fulfillment of course requirements. The rest of the subjects were paid ten dollars for their participation. All data was collected anonymously.

Materials

The experiment was conducted on an IBM PC AT computer with an IBM InfoWindows color display, Pioneer 6000 videodisk player, the NLM videodisk with magnified images of human cells, and a Sony color monitor on which the videodisk images appeared. The experimenters used a stopwatch to time the tasks. When voice commands were not issued by subjects, the experimenter would realize when the task was initiated and start the stopwatch, it was always clear when the task was completed. Times were rounded off to the nearest second. The two sets of task descriptions were nearly identical, only minute details (e.g., record numbers) were changed so that subjects could not rely on memorization of answers from the first task set to apply to the second task set. The tasks were:

Find all the image records that have to do with "heart".

Now view the detailed textual information about frame #06201 by Selecting it.

Figure 5. Dialog box for keyword search of images in the revised interface. Notice screen title is consistent with the menu item that was chosen to bring the user here. The instructions remain confined to bottom line with everday language to explain use of '*' character.

Figure 6. Retrieved-images screen with a selected item in the revised interface. Title of screen is consistent with menu item, column labels are less violent ("matches" instead of "hits"). At bottom of screen function keys appear in numerical order, and magnification and stain are placed away from the instructions for clarity.

Whenever there is a list of image records on the screen, there is a choice as to whether the image will appear automatically on the video monitor simply by using the arrow keys to move the highlight bar to that line on the screen. This is called Autodisplay mode.

Some video images have tissue labels associated with them that will appear overlayed on the image on the video monitor; however, not all images have these tissue labels. Whether the tissue label will appear or not depends on the value of the Video (tissue) Label mode.

Load the Slideshow/Showfile "long.sho" to be the current Slideshow/Showfile such that its contents are the only contents in the Slideshow/Showfile.

The 72-item Questionnaire for User Interface Satisfaction was used to collect subjective reactions (QUIS is available for license in paper, Macintosh, and MS Windows formats. Contact Carolyn Garrett at Office of Technology Liaison, University of Maryland, 4312 Knox Road, College Park, MD 20742. 301-405-4210, Carolyn_A_Garrett@umail.umd.edu).

Results

A paired samples t-test was run for both the timing data and QUIS data. Mean times were computed individually for each task; there was a statistically significant difference (p < .01) favoring the revised interface for five out of twenty tasks (table 1).

One task, task 6, favored the original interface (p < .01). Tasks 10d and 11 were not analyzed because less than half of the subjects completed these complex slideshow editing tasks within the time limit.

In the QUIS data, 19 out of the 72 questions favored the revised interface with a statistically significant advantage (p < .05) over the original interface (table 2). Five of the six questions inquiring about the system overall showed statistically significant differences favoring the revised interface (p < .02). Specifically, the revised interface, when compared to the original interface, received a higher rating on these dimensions:

Table 1. The mean time to complete each task for each interface is listed with the standard deviation in parentheses. An underlined time denotes that a statistically significant difference (p < .01) favored that interface for that task. A time limit of 300 seconds was imposed for completion of each task. There were 19 participants.

Some of the other revealing QUIS questions which favored the revised interface (p < .05) were:

Discussion

We believe that the revised interface yielded faster performance and higher satisfaction due to how information was displayed with respect to location, wording, and color choices. Consistent location on the screen for key objects allows users to find and attend to them easily. Using consistently-assigned color schemes for conceptually similar objects allows (extra) information to be displayed without cluttering the screen or confusing users (Hoadley, 90; Marcus, 86). Another major difference that allows the revised design to be more usable is word choice; this is especially evident in the slideshow menus. Words consistent with the task domain such as "print," "show," and "create/edit" were comprehended more quickly than "report," "run," and "review/edit," respectively.

Task 6 yielded faster performance with the original interface. In the revised interface, the function key approach to printing had been inadvertently removed (this was not one of our suggestions for improving the interface design!). This made it difficult (task 6 had the longest mean time with the revised interface) to complete the task unless they read the help screen.

Two subjects offered handwritten comments on the QUIS forms. Both stated that the original interface was harder to use and less understandable than the revised interface. The comments were:

The revised interface was rated superior by a statistically significant difference for all QUIS items about accessing and content of help because the original interface had no working help component. We do not believe that the inclusion of help in the revised interface made a substantial difference in the outcome. The additional time spent reading the help was included in task time. More than half of the participants attempted to use the help. A further study would need to be conducted to examine this issue independently.

A log of comments was kept on how the participants reacted during the experiment. For the most part, the participants had a hard time with certain aspects of the system. For example, most did not perform well on slideshow editing. We were also surprised that some of the participants who accessed help were not able to complete a task even though they viewed all the information needed. Since there was no training prior to the tasks, it is not surprising that subjects had difficulty. This is similar to performance we have seen on other systems in which users were required to begin work without training.

Conclusions

We were pleased to obtain experimental support showing that a modest number of changes to create a revised interface can produce measurable performance and satisfaction differences. The principal guidelines we followed to suggest improvements can be applied to many interactive computer systems.

Subjective user satisfaction should be given adequate attention as a determinant of interface success. Attention to details, such as status feedback and specific rather than generic prompts, can give users a more confident feeling about interacting with a computer system. Careful attention should be paid to issues of color choice, screen layout, and word choice, the latter using application domain terminology.

Acknowledgements

We thank Degi Young for the original usability evaluation of the MicroAnatomy program and suggestions for its improvement, for help administering the experiment, and for providing good cheer and good deeds.

Dr. Kent Norman provided valuable help with analyzing the statistical data and reviewing a draft of this paper. Leslie Carter helped significantly in designing the experimental method and statistical analysis. Dr. Catherine Plaisant also helped to design the experiment and review drafts of this paper. Andrew Sears provided expert assistance with the computer statistics package and a thoughtful review. This research was funded by the National Library of Medicine, contract number 467-MZ-000159.

References

Borgman, C.L. (1986) Why are online catalogs hard to use? Lessons learned from information retrieval studies, Journal of the American Society for Information Science, Vol. 37, No. 6, 387-400.

Botafogo, R. (1990) Structural Analysis of Hypertexts, Masters Thesis, Department of Computer Science, University of Maryland, College Park, MD 20742.

Brown, C. Marlin (1988). Human-Computer Interface Design Guidelines, Ablex, Norwood, NJ.

Chin, J.P., Diehl, V.A., and Norman, K. (1988) Development of an instrument measuring user satisfaction of the human-computer Iinterface," Proceedings of the Conference on Human Factors in Computing Systems, CHI '90. Association for Computing Machinery, New York, 213-218.

Conklin, J. (1987) Hypertext: An introduction and survey, IEEE Computer Vol. 20, No. 9, 17-41.

Converse, S., et al. (1987) Where can I find ? An evaluation of a computerized directory information system, (Unpublished), Department of Psychology, Old Dominion University, Norfolk, VA 23529.

Grudin, J. (Oct. 1989) "The Case Against User Interface Consistency," Communications of the ACM, Vol. 32, No. 10, 1164-1173.

Grudin, J. (Jan. 1992) Consistency, Standards, and Formal Approaches to Interface Development and Evaluation: A Note on Wiecha, Bennett, Boies, Gould, and Greene, ACM Transactions of Information Systems, Vol. 10, No. 1, 103-111.

Hildreth, C. (1982) Online Public Access Catalogs: The User Interface, Dublin, OH: OCLC.

Hoadley, E. (Feb. 1990) Investigating the Effects of Color, Communications of the ACM, Vol. 33, No. 2, 120-125.

Holum, K. (1988) Reliving King's Herod's Dream, Archaeology, May/June 88, 44-47.

Holum, K., Hohlfelder, R., Bull, R., Raban, A. (1988) King Herod's Dream - Caesarea on the sea, W.W. Norton & Company, New York.

Kearsley, G. (1993). Public Access Computer Systems, Ablex, Norwood, NJ.

Kellogg, W. (1989) The Dimensions of Consistency, in Nielsen, Jakob (editor) Coordinating User Interfaces for Consistency, Academic Press, San Diego, 9-20.

Lewis, C., Polson, P., Wharton, C., and Rieman, J. (1990) Testing a Walkthrough Methodology for Theory-Based Design of Walk-Up-and-Use Interfaces, Proc. of the Conference on Human Factors in Computing Systems, CHI '90. Association for Computing Machinery, New York 235-242.

Marcus, A. (Nov. 1986) The Ten Commandments of Color: a Tutorial, Computer Graphics Today, Vol. 3, No. 11, 7-14.

Marchionini, G. , Shneiderman, B. (1988) Finding facts vs. browsing knowledge in hypertext systems, IEEE Computer Vol. 21, No. 1, 70-79.

Nielsen, J. (1989) Executive Summary: Coordinating User Interfaces for Consistency, in Nielsen, Jakob (editor), Coordinating User Interfaces for Consistency, Academic Press, San Diego, 1-7.

Plaisant, C. (1991) Guide to Opportunities in Volunteer Archaeology: case study of the use of a hypertext system in a museum exhibit, Hypertext/Hypermedia Handbook, Berk, E. & Devlin, J., Eds., McGraw-Hill Publ., New York, NY 498-505.

Potter, R., Shneiderman, B., Weldon, L. (1988) Improving the accuracy of touch-screens: an experimental evaluation of three strategies, Proc. of the Conference on Human Factors in Computing Systems, ACM SIGCHI, New York, 27-30.

Reisner, P (1990) "What is Inconsistency?" Proceedings of the IFIP Third International Conference on Human-Computer Interaction, Interact '90. Elsevier Science Publishers B.V., North-Holland, 175-181.

Shneiderman, B. (1987) User interface design for the Hyperties electronic encyclopedia, Proc. Hypertext '87 Workshop, University of North Carolina, Department of Computer Science, Raleigh, NC, 199-204.

Shneiderman, B., Kearsley, G. (1989) Hypertext Hands-On!, Addison-Wesley Publ., Reading, MA, 192 pages + 2 PC disks.

Shneiderman, B., Brethauer, D., Plaisant, C., Potter, R. (May 1989) Evaluating three museum installations of a hypertext, Journal of the American Society for Information Science, Vol. 40, No. 3, 172-182.

Shneiderman, B. (1992) Designing the User Interface: Strategies for Effective Human-Computer Interaction, second edition, Addison-Wesley, Reading, MA, .

Smith, S., and Mosier, J. (1986). Guidelines for Designing User Interface Software. Report ESD-TR-86-278 Electronic Systems Division, The Mitre Corporation, Bedford, MA.

Wang, X., Liebscher, P., Marchionini, G. (1988) Improving information seeking performance in Hypertext: roles of display format and search strategy. Tech. Report No. CAR-TR-353 CS-TR-2006., Human-Computer Interaction Laboratory: University of Maryland, College Park.

Wiecha, C., Bennett, W., Boies, S., Gould, J., and Greene, S. (July 1990) ITS: A tool for rapidly developing interactive applications, ACM Transactions of Information Systems, Vol. 8, No. 3, 204-236.

Wiecha, C. (Jan. 1992) ITS and user interface consistency: A response to Grudin ACM Transactions of Information Systems, Vol. 10, No. 1, 112-114.

Young, D., and Shneiderman, B. (July 1989) Guidelines to designing an effective interface for the MicroAnatomy Visual Library System," unpublished research report, Human-Computer Interaction Laboratory, University of Maryland.

Young, E., Walker, J. H. (1987) A case study of using a manual online, unpublished manuscript, Symbolics Corp, Cambridge, MA.