Shneiderman, B., Empirical studies of programmers: The territory, paths, and destinations, Keynote address for workshop. In E. Soloway and R. Iyengar (Editors), Empirical Studies of Programmers, Ablex Publishers, Norwood, NJ, (1986), 1-12. EMPIRICAL STUDIES OF PROGRAMMERS: THE TERRITORY, PATHS, AND DESTINATIONS Ben Shneiderman Department of Computer Science University of Maryland College Park, MD 20742 ABSTRACT This paper attempts to describe the varied intellectual territory that programmers work in. It offers several paths for researchers who wish to explore this territory: controlled experiments, observational or field studies, surveys, and cognitive theories. Finally, this paper suggests several important destinations for researchers: refining the use of current languages, improving present and future languages, developing special purpose languages, and improving tools and methods. 1. INTRODUCTION Computer programming is a challenging and exciting adventure for many people. It offers the joyous experience of creation and the sweet taste of success as a reward for correct performance. Satisfaction is especially strong if the effort has been long and tiring due to errors. The strong reinforcement of programming is enhanced by the sense of power in controlling the computer and the mastery of a small, personally-defined world. Programming, like music, blends esthetics and technology. The high-level plan, the middle-level concepts, and the low-level details must be correct and in harmony with each other. Discordant data structures or missed notations are jarring. Success in programming has been traditionally measured in terms of efficient use of storage and machine resources, accuracy of the numeric results, adherence to specifications, adaptability to change, and portabi lity. These are vital criteria, but questions about the human dimension have become equally important: Is the program readable by other programmers who must test, debug, or maintain it? Is the programming language learnable, convenient for expressing certain algorithms, or comprehensible to novice users? Are design methods, flowcharts, documentation aids, or browsers helpful? Measures of human performance in programming have become valued not only for the guidance they provide for professional or novice programmers, but also for the evidence they provide about complex human cognitive processing. Empirical studies of programmers are a golden opportunity for psychologists to study human problem-solving and contribute to the refinement of programming languages, training, tools, and design methods. This paper offers a personal view of the territory covered by empirical studies of programmers, describes possible research paths for exploring this territory, and suggests some appealing destinations. 2. THE TERRITORY Like the surface of the earth, there is great diversity in programming. Researchers must recognize the differing needs of the warm beaches of student programming, the dense forests of real-time control software, and the jagged mountains of expert systems programming. The following sections categorize the terrain, offering researchers a decomposition of concerns. No research project can cover all the territory. Progress will be made incrementally by ideas that protect student programmers while they acquire skills, support testing for real-t ime systems, or help professionals maintain complex knowledge bases. 2.1 Programming Phases Programming begins with the formulation of a problem to be solved. The early stages might be called requirements analysis or preliminary design. This is followed by specification of what the program should do, but not how it should be done. Then the detailed design sets the stage for coding statements in a programming language. After personal or peer review, the program can be tested to see if it adheres to the specifications. If not, debugging is performed to isolate and repair bugs. Most programs must be revised or extended as part of a maintenance process. This neatly described process is violated more often than not. Some programs may go through six identifiable stages, while others go through eight or ten. Sometimes coding of known components begins before full specification is done. Sometimes testing of prototypes is necessary to formulate the requirements. In short, programming has many variant forms and multiple stages (Curtis et al., 1986; Weiser & Shneiderman, 1986; Shneiderman, 1980). Language design issues are related to programming phases. Some languages may ease readability, thus supporting maintenance, but they are tedious and lengthy to compose. Some languages offer great flexibility in data or control, thus simplifying design, but making debugging very difficult. If we understand these differences, we can redesign languages and support tools to maximize performance. For example, abbreviations or template selection can speed program creation, but to support readability the editor produces a complete display of keywords and automatic indentation. 2.2 Programming Situations A program might be created by one person who uses it only once, for example, for an amateur astronomer to compute the sighting angle for Halley's comet. In another situation, three or four people might construct software for a small business to do weekly inventory. At another extreme is the thousand programmers who build the air traffic control system that is used daily for life-critical decision-making tasks. Personal use software that is run only once does not require documentation, may not need to be modular, can be easily tested, and does not cause loss of life if an error is made. Working in large groups on complex software requires a different attitude towards the choice of variable names, documentation strategies, management structure to permit changes, modular decomposition, and testing. Researchers must take into account the different programming situations. 2.3 Program Size and Complexity Some programs have only ten lines, while others, such as the space shuttle software, have 10 million lines. Some programs use one array, while others have databases supported by data dictionaries listing 10 thousand variable names. Some programs use only input and print statements, while others have rich patterns of control flow, elegant coroutines, and multiple real-time interrupts. Some programs take only an hour to compose, others take many years of effort by large teams of programmers. Complexity may range over six orders of magnitude. Clearly, the programming process must be vastly different across these terrain. BASIC may be acceptable for short programs with 10 variable names, but when long programs are written more meaningful variable names, better modular decomposition, and richer control and data structures are necessary. Can one programming language be flexible enough to meet the needs of these differing conditions? Are there dangers in scaling up from a small prototype to a large system? 2.4 Skill Levels Graduates of six-week courses may call themselves programmers, but their skills are vastly different from the thoughtful professional with ten years of experience. Some languages may be suitable for novices, but unacceptable for professionals because of limitations in features. Languages with many features may overwhelm novices, but be attractive to experts. Can one language have enough flexibility to be suitable for beginners and experts? Expertise has many dimensions. Programmers may be expert in the syntax of a specific programming language, in certain algorithms (e.g. string manipulation, graphic data structures), or in an application area (e.g. payroll, banking, or chemistry). Some programmers are exceptional at design, while others excel at debugging (Weinberg, 1971). There are underlying cognitive style, learning style, or personality differences that contribute to skill and influence effectiveness. Increased attention is being paid to individual differences among programmers and to complementary personalities in forming teams (Buie, 1985). Can personality tests be used to form teams or predict success in programming? Unfortunately, we still have only poor methods for evaluating programmer skill, as demonstrated by performance differences of 20 to 1 and more for professionals in the same organization and job title (Curtis, 1981; DiPersio, Isbister & Shneiderman, 1980). Is there hope for developing a reliable ability test? How can peer ratings or supervisor evaluations be made more accurate? Can we measure quality and productivity for programmers? 2.5 Programming Languages The number of widely used and documented computer programming languages is approximately 200, yet the variety is impressive (Sammet, 1978). They range from languages with 30-40 keywords to languages with more than a thousand variant commands. There are languages for special purposes such as steel structure analysis or music composition, and languages for general use such as PASCAL, FORTRAN, or C. There are interpreted and compiled languages, textual and visual languages, procedural and non-procedural languages, fixed and extensible languages, and interactive and batch languages. Researchers must recognize the style of programming promoted by each language and consider design suggestions within that context. Indentation rules may be helpful in PASCAL but confusing in APL. Rules for modular design or parameter passing in COBOL may be unsuitable for LISP or PROLOG. Experimental results have influenced Ada and other recent languages, but designers still work largely from intuition. A major research project to investigate the impact of programming language design on productivity and error rates and to develop an understanding of the cognitive processes in programming could have a dramatic influence on future programming languages. 2.6 Programming Tools and Methods Programming is greatly influenced by the available tools. The advent of interactive usage changed the preferences in programming languages, design and debugging methods, and teaching strategies. The emergence of syntax-directed editors, browsers, interactive design aids, dynamic debuggers, etc. changed the nature of programming. The highly argumentative discussions about the design of these tools could be made more productive and scientific by the inclusion of empirical data about usage. When does rapid response time aid programming? How might large screens be a hindrance? Do graphical or visual displays of data structures or program execution really help? There are also a diverse set of programming methods, such as the ones proposed by Jackson, Warnier-Orr, Mills, or Parnas. These methods are often vaguely defined and require interpretation when they are applied. A better understanding of when each method is most effective would be a major contribution. 3. THE PATHS Explorations of the territory are increasingly well-planned, guided by practical considerations, and directed by a theoretical framework. 3.1 Controlled Experiments There are many ways to do research on programmers, but the discipline of controlled psychologically-oriented experiments produces reliable and authoritative results. No single experiment can conclusively answer all our questions, but a series of well-designed and narrowly focussed studies will clarify an area of concern. Each experiment is a small tile in the emerging mosaic of human performance in programming (Shneiderman, 1980; Platt, 1964). Effective and influential studies are built on a solid theoretical foundation and a knowledge of realistic concerns in practical programming situations. These dual bases lead to a lucid and testable hypothesis followed by the identification of a small number of independent variables to alter and a small number of dependent variables to measure. The outcome should suggest refinements to the theoretical foundation and guidance to the professional practitioner. Successful experiments require great care in the selection of subjects, assignment to groups, choice of tasks, design of materials and instructions, diligence in administration, wisdom in applying statistical techniques, and skill in writing up the results (Brooks, 1980, Moher & Schneider, 1981). At least one pilot test of the materials and procedures is necessary for refinement, but successful studies often develop only after a series of preliminary experiments. The high variability in human performance with complex cognitive tasks such as programming often obscures the modest differences from the independent variable treatments. Refinement of the dependent measures can help and within-subjects designs are also effective. Controlled experiments can be so rigid and artificial that the laboratory-like conditions do not reflect or apply to the reality of programming. Each step in improving controls and eliminating biases may also be a step that removes the results from validity and applicability. The knowledgeable experimenter will also blend the discipline of controlled experimentation with a sensitivity for individual performance, a curiosity about extreme scores, and an awareness of anecdotal results that may lead to novel hypotheses. 3.2 Thinking Aloud and Observational Studies The precision and rigidity of controlled experimentation may be inappropriate when a researcher is exploring new domains in which the independent and dependent variables are unclear. Thinking aloud or observational studies are applicable when the tasks are so complex and varied that setting precise goals for subjects would invalidate the experiment (Lewis, 1982). Typical situations are early stages of program design or exploratory debugging. The subjects can be videotaped, audiotaped, or simply observed for critical incidents or frequency of reference to specific terms or processes (Littman et al., 1986; Brooks, 1977). Observational studies in realistic setting can lead to discovery of work styles that can be helpful to others or suggest novel software support tools (Adelson & Soloway, 1985; Grantham & Shneiderman, 1984). The danger of observational studies is that they take a large amount of experimenter time and that resulting conjectures may not be widely applicable. Furthermore, cause and effect relationships cannot be easily demonstrated. Still they are a valuable tool that should be considered. 3.3 Field Studies, Surveys, and Data Capture Other less formal approaches can contribute valuable insights to the programming process. Field studies to capture performance during professional or student programming are helpful in identifying baselines of normal behavior (Saal & Weiss, 1977; Knuth, 1972). Surveys can elicit useful opinions of what programming methods, tools, languages, etc. are beneficial. Machine logging of response times, frequency of use of programming language features, frequency of use of programming tools such as dynamic debuggers, etc. can also help to clarify actual programmer activity. There is a great opportunity to conduct grand national "clinical trials" of programming languages, tools, or methods. Similar to clinical medical trials, data would be conducted on a large number of programmers in a large number of programming situations. This enormous database would be available for many researchers to explore hypotheses, probe for unexpected correlations, and compare performance over time as changes are made. A realistic clinical trial might take five years and cost $10-20 million dollars, but the benefits are potentially much greater. 3.4 Theories No area is more important to the health and growth of empirical studies of programmers than the development of cognitive theories of programming. No one theory will encompass all of programming. Many smaller theories are necessary. Some theories may address only the use of conditionals by novice child programmers. Other theories may suggest effective maintenance procedures for professionals working on parallel processing algorithms. Some theories are broadly conceived and explanatory (a framework for describing o r teaching programming), while more narrowly focussed theories can be predictive (a mathematical model for predicting the time to locate a bug as a function of program complexity). The syntactic/semantic model of long-term memory is an explanatory model, but as it is refined it has the potential to become more predictive (Shneiderman & Mayer, 1979; Shneiderman, 1986) (Figure 1). In this model, syntactic knowledge is the language-dependent details for carrying out actions or defining objects. This knowledge is arbitrary and acquired by rote memorization; therefore it is difficult to retain in long-term memory. For example, the choice of the exponential operator (single asterisk in APL, double asterisk in FORTRAN, up-arrow in BASIC, or a function in LISP or PASCAL), iteration keyword (DO, FOR, LOOP, or REPEAT), or use of semi-colons (to terminate or separate statements) is language-dependent and arbitrary. This knowledge must be frequently rehearsed to preserve retention. By contrast, semantic knowledge is meaningfully acquired by reference to previous knowledge, by example, or by analogy. There is a logical structure to semantic knowledge that is independent of the specific syntax used to record it. Semantic knowledge is further decomposed into computer and task-related domains. Computer knowledge has to do with the actions and objects in the computer domain. The low-level actions might be assignment, iteration, conditional execution, input, output, synchronization, etc. Higher level actions are algorithms for sweeping through an array, for sorting items, or for recursive binary tree search. Computer objects include the low-level data types such as booleans, integers, strings, real numbers, etc. Higher level objects include arrays, records, stacks, or threaded trees. The semantics of the task domain can also be decomposed into objects and actions. If the task domain is stock market portfolio management then the actions might include opening or closing a portfolio, buying or selling stocks, and tabulating or displaying performance over a three-month period. The objects might include portfolios, customers, stocks, buy or sell orders, transaction dates, and prices. Successful programmers must master the syntax of a specific programming language, the semantics of computer programming, and the semantics of the task domain. Novice programmers may be familiar with a task domain (for example, inventory control or satellite orbit determination), but must acquire the semantics of computer programming and the syntax of one or more programming languages. Professional programmers must become proficient in the task domain before they can begin design. It's hard to say which challenge is greater. Any model is an abstraction of reality. The syntactic/semantic model is not a perfect representation of a programmer's knowledge or cognitive process, but it has been helpful in sorting out the differences between novice and expert programmers, teaching programming, writing textbooks, understanding bugs, and preparing documentation. Still it is a rough model that needs refinement and verification (Barfield, 1986; Wiedenbeck, 1986; Adelson, 1981; McKeithen et al., 1981; Shneiderman, 1977). A variant model based on plans and goals has been formulated by Soloway and his colleagues (Soloway et al., 1983; Soloway & Ehrlich, 1984; Rist, 1986). 4. THE DESTINATIONS Some explorations are justifiable just for the joy of discovery. Understanding human performance in computer programming is a satisfying process and it has important benefits for improving the practice of programming and for sharpening our model of human cognition. As a computer scientist, my primary goal is the former, but my secondary goal is the latter. I am proud to consider myself as twenty percent experimental psychologist, but my approach has been to apply empirical techniques to the benefit of computer programming practices, languages, tools, and management practices. Contemporary programming languages are a remarkable achievement. They provide a notation for precise expression of complex and useful processes. This notation is parsable by a computer and yet more or less readable by people. This balance generates remarkable power to accomplish intellectual, commercial, educational, and entertainment tasks. These successes might be attributed to clever choice of a small number of control and data structures, the provision of facilities for modular decomposition of large problems, and the capacity for extensibility by function and data type definition. These successes are tempered by many weaknesses, or shall I say, opportunities for improvement. Contemporary programming languages often take long to learn, are error-prone, and can be tedious to compose with. There seem to be insufficient mechanisms for level structuring, weak facilities for checking correctness, and poor tools for maintenance. In short, there are many destinations available for the energetic researcher. 4.1 Refining the Use of Current Languages The most direct means for improving programmer productivity and program quality is by making incremental improvements in the use of current languages. The following brief list offers some ideas about where research efforts might lead to rapid and substantial improvement: - better use of mnemonic variable names: more meaningful and distinctive mnemonic variable names can be helpful in comprehending programs. This effect becomes noticeable when the number of variable names grows beyond 10-15 and when the programmers are unfamiliar with the task domain. There is a danger about excessively long variable names as they can interfere with comprehension. A range of variable name lengths in a program may support recognition. An in-depth study of naming policies, hierarchical naming strategies, and name documentation methods would be useful. What makes a name memorable? Is it distinctiveness in a name set or familiarity to the user or close linkage to meaning? - program formatting: Indentation has often been cited as an aid to comprehension, but it can have negative effects in breaking up the semantic units of a program in favor of syntactic units. A modest level of indentation (2-4 spaces) has been shown to be beneficial (Miara et al., 1983). The use of blank lines to show modular organization is also helpful. The use of multiple fonts and characters sizes is being explored as an aid to program comprehension (Baecker & Marcus, 1986). How should programs be shown on screens vs. paper? Would color coding be helpful? - comments, documentation, flowcharts, and diagrams: embedded comments, external writeups of program structure, pseudocode design documents, control flow oriented flowcharts, structure charts, and data structure diagrams are promoted as aids to program composition or comprehension. For the knowledgeable programmer, these aids are helpful when they reveal information that is difficult to extract from the program text; for example when one sheet of paper shows the relationship among thirty program modules (Shneiderman, 1982). Merely reiterating the program in a detailed visual form may produce a lengthy distracting document. What aids, for which people, for which tasks are most effective (Sheppard, Kruesi & Curtis, 1981)? - modular design by data and procedural abstraction: program organization plays a key role in comprehensibility and modifiability. The top-down design, information hiding, data abstraction, procedure abstraction, and reusable component approaches to modular design offer promising possibilities that have not yet been fully exploited. Little attention has been paid to the motivation for these approaches as a function of human cognitive skills. How might modularity for novice programmers differ from modularity for experts? When is a linearly organized program more comprehensible than a modular program? When does a program reach the point of excess modularity? How should variables be shared among modules? - group processes and team organizations: although it is difficult to perform carefully controlled experiments on group processes and team organizations the evidence is strong that one of the most effective means for rapid improvement in productivity and quality is by structured walkthroughs or design inspections (Freedman & Weinberg, 1982; Basili & Reiter, 1979; Fagan; 1976). Group processes provide an opportunity to communicate program designs, educate junior team members, increase motivation, and facilitate cooperation. Still, these methods are under utilized and poorly understood. Social and industrial psychologists might be helpful in developing and studying these techniques. 4.2 Improving Present and Future Languages Researchers are often more attracted to the creation of a new language rather than the refinement of current languages. Commercial programmers and managers are more interested in improving specific features and enhancing current languages. There is plenty of opportunity for both. - control structures: the contemporary IF-THEN-ELSE and DO-WHILE patterns are effective in many cases but they become clumsy when nesting is deep and complexity is great (Sykes, Tillman & Shneiderman, 1983; Sime, Green & Guest, 1977). More powerful CASE, SELECT, or SEARCH commands might accomplish the work of several lower level control structures. Backtracking, recursion, co-routining, concurrent and parallel control, object-oriented, interrupt handlin g, and error handling control structures have yet to be adequately explored. - data declarations: composite data types, such as a record or a stack of arrays, are a major contribution, but more elaborate data typing facilities will be helpful in the next generation of languages. Level structuring and modular design of data types, as is done in database systems, may become available in standard programming languages. How can the semantics and syntax be made comprehensible? Empirical studies can shed light about what makes one data structure more or less cognitively complex (Iyengar, Bastani & Fuller, 1985). - direct manipulation programming: the text-oriented style of contemporary programming languages will be complemented by more visually-oriented styles of programming (Shneiderman, 1983). Programmers would see representations of objects and actions in the task domain. By pointing, dragging, and drawing the programmer would create programs for later execution. Such programs may be easy to understand, simple to debug, and rapid to revise. Direct manipulation depends on an analogically appealing, task domain-related, visual representation of the objects and actions of interest. Definition of actions is by pointing, dragging, or drawing, instead of typing command strings. Execution of actions is rapid, incremental, and reversible. The Xerox Star and Apple Macintosh offer many of the aspects of direct manipulation for carrying out tasks, but are weak in supporting programming. The Wang Decision Processing and Spinnaker's Delta Drawing packages provide some of these features. There are currently many efforts to develop visual or direct manipulation programming environments (Iseki & Shneiderman, 1986; Glinert & Tanimoto, 1984; Halbert, 1984). - spreadsheets: these powerful tools enable users to do much of the work that formerly required programming in a procedural language. Spreadsheet programs offer direct manipulation of an accountant's model of reality and provide a variety of powerful operations in a relatively simple manner. But spreadsheets are to the 1980's what COBOL was to the 1960's. The computer science community has largely ignored this fundamental and important innovation (Shneiderman, 1985). It seems there is a great opportunity to learn f rom the VisiCalc, LOTUS 1-2-3, Multiplan, etc. designs and to contribute to their evolution. This visual representation of the world of action may be useful model for designing rule-based systems, array computations, or data structures. 4.3 Developing Special Purpose Languages General purpose programming languages are often deficient in handling some novel set of features. This leads designers to propose special purpose languages. Sometimes these languages survive to serve a small community, but often the ideas eventually influence extensions to general purpose languages. Important destinations for researchers include contributions to the development of these special purpose languages: - rule-based and logic programming: a currently hot topic in computer science is the design of rule-based and logic programming languages. Unfortunately, the designers of these systems have not absorbed the lessons of structured programming, modular design, or human factors. The resulting systems are often difficult to program in, error prone, and hard to comprehend. Substantial improvements are possible by modular organization of facts and rules, use of visual presentations, and improved notations. - dialog management systems: traditional procedural programming languages do not provide adequate facilities for creating complex interactions. Dialog management systems, also called user interface management systems, separate the design of the user interface from the underlying programming tools. The interaction style can be changed be merely altering a few lines of the specifications. The design of dialog management systems will be one of the most important topics for the next decade. Creating a user interface for dialog management systems is a further challenge. - sound, voice, graphics, videodisk, and animation: traditional programming languages also fail to provide adequate facilities for novel interaction techniques using sound, voice, graphics, videodisk, and animation. New programming languages or extensions will be needed to accommodate these features. - robot and physical device control: when three-dimensional reality and timing constraints are part of the task-domain new languages will be necessary to support convenient programming. Programs may become cartoons or be created by physically moving a robot arm through a specific motion to paint a car door or place an integrated chip on a circuit board. Modular design and editing of these programs is a provocative challenge. 4.4 Improving Tools Effective tools amplify the user's power to do work. A pressing goal is to provide programmers with more powerful tools to cope with the increased complexity of modern systems and the higher quality and reliability that customers demand. - syntax-directed editors: editors can be designed to reflect the objects in programs (procedures, variable names, or keywords), instead of merely dealing with strings of characters. During program composition, a syntax-directed editor can guide the programmer to create syntactically correct programs by offering a choice of permissible structures or templates. Benefits arise through reduced keystrokes, fewer typographic errors, and hopefully fewer slips in converting a design into a program. Disadvantages include greater demands on the hardware, some distraction for the expert user, and some clumsiness in current interfaces. - semantics-directed editors/browsers: modern programming development tools enable users to conveniently edit or display programs. A hierarchical browser shows a table of contents of the modules in a program and the user can see the code in the module by just pointing and clicking on the module name (Shafer et al., 1986). Similarly, placing the cursor on a variable name and pressing a single key can result in a display of the data declaration for the variable. Changes to a variable name in a declaration can result in changes to the variable name everywhere it is used in the program. Finally, maintenance tools can aid in the automatic rewriting of programs. For example, elimination of all code related to a specified output variable would help in trimming a program while conversion of a scalar variable to an array would help in expanding a program. - testing and debugging tools: the highly personal styles of testing and debugging are giving way to disciplined approaches supported by useful and convenient tools. The design of such tools would benefit from human factors studies of how people do testing and debugging, as well as from studies of user interface design for exploratory tasks (Weiser & Lyle, 1986; Spohrer & Soloway, 1986). - resuable libraries of code: the opportunity to reuse program or design components is attractive to managers who wish to speed development, reduce costs, and ensure reliability. The benefits are potentially great, but most observers feel that the reality is far short of the potential. Software tools to support reuse are beginning to be developed, but the social structure of programming has to be considered as well. First, there must be benefits to the person whose code is reused and to the person who reuses code. Second the atmosphere of trust needs to be enhanced by information about who developed the code, who maintains it, who else has already successfully reused it, and thorough documentation of the inputs and outputs. 5. CONCLUSION The term programming covers a fascinating and broad territory that invites exploration by courageous and bold research pioneers. The paths are still rough, but the increasing traffic smoothes out the bumps and leaves clearer signposts for future travelers. The destinations are attractive: the chance to improve programming practice and to comprehend complex human problem-solving. Acknowledgements: I greatly appreciate this opportunity, offered by Elliot Soloway and Sitharama Iyengar, to scan the horizon and report to my colleagues. It gave me the chance to reflect on 15 years of papers by fellow researchers and students. I felt warmed and encouraged while reviewing the work of hundreds of bold researchers who have ventured from narrow professional domains to blend psychology and computer science. But this is just the beginning. The greatest opportunities are still ahead. I appreciated the helpful comments from Richard Furuta, William Gasarch, Jim Hend ler, Susan Humphrey, Sitharama Iyengar, and Elliot Soloway. BREFERENCES Adelson, Beth (1981). Problem solving and the development of abstract categories in programming languages, \fIMemory and Cognition 9\fR, 422. Adelson, Beth and Soloway, Elliot (1985). The role of domain experience in software design, \fIIEEE Transactions on Software Engineering SE-11\fR, November 1985. Baecker, Ron and Marcus, Aaron (1986). Design principles for the enhanced presentation of computer program source text, \fIProceedings of the ACM SIGCHI '86: Human Factors in Computer Systems\fR, Available from ACM, New York, NY. Barfield, Woodrow (1986). Expert-novice differences for software: implications for problem-solving and knowledge acquisition, \fIBehavior and Information Technology\fR, to appear. Basili, V. R. and Reiter, R. W. (1979). An investigation of human factors in software development, \fIIEEE Computer 12\fR, 12, December 1979, 21-38. Brooks, Ruven (1980). Studying programmer behavior experimentally: The problems of proper methodology, \fICommunications of the ACM 23\fR, 4, April 1980, 207-213. Brooks, Ruven, Towards a theory of the cognitive processes in computer programming, \fIInternational Journal of Man-Machine Studies 9\fR, (1977), 737-751. Buie, Elizabeth (1985). Jungian psychological type and programmer team building, \fIProceeding of IEEE COMPSAC '85\fR. Curtis, Bill (1981). Substantiating programmer variability, \fIProceedings of the IEEE 69\fR, July 1981, 533. Curtis, Bill, Brooks, Ruven, Soloway, Elliot, Black, John, Ehrlich, Kate, and Ramsey, H. Rudy (1986). Software psychology: The need for an interdisciplinary program, \fIProceedings of the IEEE\fR, to appear. DiPersio, Tom, Isbister, Dan, and Shneiderman, Ben (1980). An experiment using memorization/reconstruction as a measure of programmer ability, \fIInternational Journal of Man-Machine Studies 13\fR, 339-354.. Fagan, Michael E., Design and code inspection to reduce errors in program development, \fIIBM Systems Journal 15\fR, 3, (1976). Freedman, Daniel P. and Weinberg, Gerald M. (1982). \fIHandbook of Walkthroughs, Inspections, and Technical Reviews\fR, Third Edition, Little, Brown and Co., Boston, MA. Glinert, Ephraim and Tanimoto, Steven L. (1984). Pict: An interactive graphical programming environment, \fIIEEE Computer 17\fR, November, 1984, 7-25. Grantham, Charles and Shneiderman, Ben (1984). Programmer behavior and cognitive activity: An observational study, \fIProceedings ACM Washington, DC Chapter Annual Technical Symposium\fR. Halbert, Daniel (1984). Programming by Example, Ph. D. dissertation, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA. Available as Xerox Report OSD-T8402, Palo Alto, CA. Iseki, Osamu and Shneiderman, Ben (1986). Applying direct manipulation concepts: Direct Manipulation Disk Operating System (DMDOS), University of Maryland Department of Computer Science Technical Report. Iyengar, S. S., Bastani, F. B., and Fuller, J. W. (1985). An experimental study of the complexity of data structures, In Agrawal, J. C., and Zunde, P. (Editors), \fIEmpirical Foundations of Information and Software Science\fR, Plenum Press, New York, NY, 225-239. Knuth, Donald (1972). An empirical study of FORTRAN programs, \fISoftware: Practice and Experience 1\fR, 105-133. Lewis, Clayton (1982). Using the "Thinking-aloud" method in cognitive interface design, RC9265, IBM Yorktown Heights, NY, (February 1982). Littman, David C., Pinto, Jeannine, Letovsky, Stan, and Soloway, Elliot (1986). Mental models and software maintenance, In Soloway, Elliot and Iyengar, Sitharama (Editors), \fIEmpirical Studies of Programmers\fR, Ablex Publishers, Norwood, NJ. McKeithen, K. B., Reitman, J. S., and Hirtle, S. C. (1981). Knowledge organization and skill differences in computer programmers, \fICognitive Psychology 13\fR, 307. Miara, Richard J., Musselman, Joyce A., Navarro, Juan A., and Shneiderman, Ben (1983). Program indentation and comprehensibility, \fICommunications of the ACM 26\fR, 11, November 1983, 861-867. Moher, Tom and Schneider, G. Michael, Methods for improving controlled experimentation in software engineering, \fIProceedings of the Fifth International Conference on Software Engineering\fR, Available from IEEE, (1981), 224-233. Platt, John (1964). Strong inference, \fIScience 146\fR, Number 3642, October 16, 1964, 347-353. Rist, Robert S. (1986). Plans in programming: Definition, demonstration and development. In Soloway, Elliot and Iyengar, Sitharama (Editors), \fIEmpirical Studies of Programmers\fR, Ablex Publishers, Norwood, NJ. Saal, H. J. and Weiss, Z. (1977). An empirical study of APL programs, \fIComputer Languages 2\fR, 3, 47-60. Sammet, Jean E. (1978). Roster of programming languages 1976-77, \fIACM SIGPLAN Notices 13\fR, 11, November 1978, 56-85. Shafer, Phil, Simon, Roland, Weldon, Linda, and Shneiderman, Ben (1986). Display strategies for program browsing: Concepts and an experiment, \fI IEEE Software\fR, to appear. Sheppard, S. B., Kruesi, E., and Curtis, B. (1981). The effects of symbology and spatial arrangement on the comprehension of software specifications, \fIProceedings of the Fifth International Conference on Software Engineering\fR, 207-214. Available from IEEE, Piscataway, NJ. Shneiderman, Ben (1986). \fIDesigning the User Interface: Strategies for Effective Human-Computer Interaction\fR, Addison-Wesley Publishers, Reading, MA, to appear. Shneiderman, Ben (1982). Control flow and data structure documentation: Two experiments, \fICommunications of the ACM 25\fR, 1, January 1982, 55-63. Shneiderman, Ben (1980). \fISoftware Psychology: Human Factors in Computer and Information Systems\fR, Little, Brown and Co., Boston, MA. Shneiderman, Ben (1977). Measuring computer program quality and comprehension, \fIInternational Journal of Man-Machine Studies 9\fR, 465-478. Shneiderman, Ben and Mayer, Richard (1979). Syntactic/semantic interactions in programmer behavior: A model and experimental results, \fIInternational Journal of Computer and Information Sciences 7\fR, June 1979, 219-239. Reprinted in Curtis, Bill (Editor), \fIHuman Factors in Software Development\fR, IEEE EHO 185-9, (1981). Sime, M. E., Green, T. R. G. and Guest, D. J. (1977), Scope marking in computer conditionals - a psychological evaluation, \fIInternational Journal of Man-Machine Studies 9\fR, 107-118. Soloway, E. and Ehrlich, K. (1984). Empirical studies of programming knowledge, \fIIEEE Transactions on Software Engineering SE-10\fR, 5, 595-609. Soloway, E., Ehrlich, K., Bonar, J., Greenspan, J. (1983). In Badre, A. and Shneiderman, B. (Editors), \fIDirections in Human-Computer Interaction\fR, Ablex Publishers, Norwood, NJ, 27-54. Spohrer, James C. and Soloway, Elliot (1986). Analyzing the high frequency bugs in novice programs. In Soloway, Elliot and Iyengar, Sitharama (Editors), \fIEmpirical Studies of Programmers\fR, Ablex Publishers, Norwood, NJ. Sykes, F., Tillman, R., and Shneiderman, B. (1983). The effect of scope delimiters on program comprehension, \fISoftware: Practice and Experience 13\fR, 817-824. Weinberg, Gerald M. (1971). \fIThe Psychology of Computer Programming\fR, Van Nostrand Reinhold, New York, NY. Weiser, Mark and Lyle, Jim (1986). Experiments on slicing-based debugging aids, In Soloway, Elliot and Iyengar, Sitharama (Editors), \fIEmpirical Studies of Programmers\fR, Ablex Publishers, Norwood, NJ. Weiser, Mark and Shneiderman, Ben (1986). Human factors of computer programming, In Salvendy, G. (Editor), \fIHandbook of Human Factors/Ergonomics\fR, John Wiley & Sons, Inc., New York, NY to appear. Wiedenbeck, Susan (1986). Processes in computer program comprehension, In Soloway, Elliot and Iyengar, Sitharama (Editors), \fIEmpirical Studies of Programmers\fR, Ablex Publishers, Norwood, NJ. Figure 1: The long-term knowledge necessary for programming as portrayed in the syntactic/semantic model. Syntactic knowledge is arbitrary, poorly structured, learned by rote memorization, and must be rehearsed to ensure retention. Semantic knowledge, which is subdivided into computer and task domains, is organized into objects and actions, meaningfully acquired, stable in memory, and language independent.