Tree-Maps: A Space-Filling Approach to the Visualization of Hierarchical

Information Structures

Brian Johnson                     Ben Shneiderman

Brianj@cs.umd.edu     ben@cs.umd.edu

Department of Computer Science & Human-Interaction Laboratory

University of Maryland, College Park, MD 20742

Abstract

This paper describes a novel method for the visualization of hierarchically structured information. The Tree-Map visualization technique makes 100% use of the available display space, mapping the full hierarchy onto a rectangular region in a space-filling manner. This efficient use of space allows very large hierarchies to be displayed in their entirety and facilitates the presentation of semantic information.

1 Introduction

A large quantity of the world’s information is hierarchically structured: manuals, outlines, corporate organizations, family trees, directory structures, internet addressing, library cataloging, computer programs... and the list goes on. Most people come to understand the content and organization of these structures easily if they are small, but have great difficulty if the structures are large.

We propose an interactive visualization method for presenting hierarchical information called Tree-Maps. We hope that the Tree-Map approach is a step forward in the visualization of hierarchical information, and that it will produce benefits similar to those achieved by visualization techniques in other areas.

As humans we have the ability to recognize the spatial configuration of elements in a picture and notice the relationships between elements quickly. This highly developed visual ability allows people to grasp the content of a picture much faster than they can scan and understand text [12].

The Tree-Map visualization method maps hierarchical information to a rectangular 2-D display in a space-filling manner; 100% of the designated display space is utilized. Interactive control allows users to specify the presentation of both structural (depth bounds, etc.) and content (display properties such as color mappings) information. This is in contrast to traditional static methods of displaying hierarchically structured information, which generally make either poor use of display space or hide vast quantities of information from users. With the Tree-Map method, sections of the hierarchy containing more important information can be allocated more display space while portions of the hierarchy which are less important to the specific task at hand can be allocated less space [9,10].

Tree-Maps partition the display space into a collection of rectangular bounding boxes representing the tree structure [20]. The drawing of nodes within their bounding boxes is entirely dependent on the content of the nodes, and can be interactively controlled. Since the display size is user controlled, the drawing size of each node varies inversely with the size of the tree (i.e., # of nodes). Trees with many nodes (1000 or more) can be displayed and manipulated in a fixed display space.

The main objectives of our design are:

Efficient Space Utilization

Efficient use of space is essential for the presentation of large information structures.

Interactivity

Interactive control over the presentation of information and real time feedback are essential.

Comprehension

The presentation method and its interactive feedback must facilitate the rapid extraction of information with low perceptual and cognitive loads.

Esthetics

Drawing and feedback must be esthetically pleasing.

Hierarchical information structures contain two kinds of information: structural (organization) information associated with the hierarchy, and content information associated with each node. Tree-Maps are able to depict both the structure and content of the hierarchy. However, our approach is best suited to hierarchies in which the content of the leaf nodes and the structure of the hierarchy are of primary importance, and the content information associated with internal nodes is largely derived from their children.

2 Motivation: Current Methods and Problems

This work was initially motivated by the lack of adequate tools for the visualization of the large directory structures on hard disk drives.

Traditional methods for the presentation of hierarchically structured information can be roughly classified into three categories: listings, outlines, and tree diagrams. It is difficult for people to extract information from large hierarchical information structures using these methods, as the navigation of the structure is a great burden and content information is often hidden within individual nodes [23].

Listings are capable of providing detailed content information, but are generally very poor at presenting structural information. Listings of the entire structure with explicit paths can provide structural information, but require users to parse path information to arrive at a mental model of the structure. Alternatively, users may list each internal node of the hierarchy independently, but this requires users to manually traverse the hierarchy to determine its structure. Outline methods can explicitly provide both structural and content information, but since the structural indentation can only be viewed a few lines at a time, it is often inadequate [4].

The number of display lines required to present a hierarchy with both the listing and outline methods is linearly proportional to the number of nodes in the hierarchy. These methods are inadequate for structures containing more than a few hundred nodes. A great deal of effort is required to achieve an mental model of the structure in large hierarchies using these methods.

Tree drawing algorithms have traditionally sought efficient and esthetically pleasing methods for the layout of node and link diagrams. These layouts are based on static presentations and are common in texts dealing with graph theory and data structures. They are excellent visualization tools for small trees [2,10,12,13,17]. However, these traditional node and link tree diagrams make poor use of the available display space. In a typical tree drawing more than 50% of the pixels are used as background. For small tree diagrams this poor use of space is acceptable, and traditional layout methods produce excellent results. But for large trees, traditional node and link diagrams can not be drawn adequately in a limited display space. Attempts to provide zooming and panning have only been only partially successful [10].

Another problem with tree diagrams is the lack of content information; typically each node has only a simple text label. This problem exists because presenting additional information with each node quickly overwhelms the display space for trees with more than just a few nodes.

The presentation of content information in all of these traditional methods has usually been text based. Although tree diagrams are a graphically based method capable of making use of visualization techniques, and many of the ideas presented in this paper. Unfortunately, global views of large tree diagrams require the nodes to be so small that there is virtually no space in which to provide visual cues as to node content.

Tree-Maps efficiently utilize the designated display area and are capable of providing structural information implicitly, thereby eliminating the need to explicitly draw internal nodes. Thus much more space is available for the rendering of individual leaf nodes, and for providing visual cues related to content information.

Tree-Maps provide an overall view of the entire hierarchy, making the navigation of large hierarchies much easier. Displaying the entire information structure at once allows users to move rapidly to any location in the space. As Beard states in his paper on navigating large two-dimensional spaces [1], "If the two-dimensional information space fits completely onto a display screen, there is no navigation problem ... Users are never lost because they can see the complete information space."

3 A Directory Tree Example

Obtaining information about directory trees was the initial motivation for this research and provides a familiar example domain. For illustrative reasons, the hierarchy in this example is small and nodes have only an associated name and size. While reading through this example, think about how the techniques described would scale up to a directory tree containing 1000 files. An Apple Macintosh screen snapshot showing a Tree-Map of 1000 files from one of our laboratory's hard disk drives follows this example.

Presenting directory structures is a very practical problem. The following are the methods widely available today:

We are not aware of approaches that provide a visual representation of the relative sizes of files or directories.

Even moderately sized directory trees are difficult to visualize using standard operating system interfaces. With command line interfaces such as UNIX "ls" or DOS "dir", only the immediate children of any directory are listed. An overall view of the directory tree must be pieced together by traversing the various paths and listing the immediate children of the currently active directory.

Desktop metaphors and their windowing strategies are another alternative. One of the problems with windows is that they often obscure each other, and users may spend much of their time may be spent arranging windows. Also, the tree structure is not apparent unless windows have been carefully placed. Desktop icons generally show only the type of the file. Much richer visual mappings are possible but are currently not available, for instance, the depth of an icon's shadow could be used to indicate file size.

We will use a small directory tree hierarchy as an example. Tree A depicted in Figures 1 through 7 contains 23 nodes, of these 6 are directories (internal nodes) and 17 are files (leaf nodes). This tree is structured such that among siblings, file nodes always precede directory nodes.

In Figure 1 we see an outline view similar to the presentations provided by PCShell under DOS, the UNIX command "du" or Microsoft Windows 3.0. This presentation requires 23 lines; a structure with 1000 files would require a minimum of 1000 lines in order to present both directories and files.

Figure 2 presents a typical tree diagram, such drawings can be found in graph theory textbooks. This tree drawing approach is similar to the presentation method used by the OpenWindows File Manager. Directory trees with 1000 files cannot be drawn all at once on a typical screen (if all files are at the same level, each file node will have less than one pixel in which to draw itself). The problem becomes even more severe when real file names are used as node labels.
 
 



Figure 3 presents the same information in yet another manner, as a Venn diagram. We use this figure for illustrative purposes as a familiar and often used set theoretic visualization technique. It is an intermediate step which facilitates the transition from traditional presentations to Tree-Maps. This is an odd use of Venn diagrams, as one does not usually think of files and directories as sets. However, simple directory structures can be thought of as set theoretic collections of files, using only the containment (subset) property. Note that each node has been drawn proportionate to its size.

The space required between regions would certainly preclude this Venn diagram representation from serious consideration for larger structures. Note that this "waste" of space is also present in traditional tree diagrams. Using boxes instead of ovals and a bin-packing algorithm could partially solve this space problem. But bin-packing is an NP-complete problem and does not preserve order.

Figure 4 is a box-based Venn diagram which illustrates a more efficient use of space and is an excellent tool for the visualization of small hierarchies. But even the small degree of nesting present in this technique renders it unsuitable for the presentation of large hierarchies. Fortunately space efficient results can be achieved without bin-packing, using our "slice and dice" Tree-Map approach, a simple linear method in which the algorithm works top-down. An analogy should quickly illustrate this concept. If the hard disk drive were a large, flat, rectangular cheese, one could certainly slice it into chunks representing the size of each top level directory. Applying this slice and dice algorithm recursively to each piece of the cheese, and rotating the slicing direction 90 degrees at each recursive step, would result in the Tree Map of Figure 5.

Figure 5 simply eliminates the nesting offset used to seperate objects at each level. If we wanted to distribute our cheese to 17 people based on their weights, Figure 5 would give us a slicing diagram. This weight-proportionate distribution is one of the important features of Tree-Maps. The Tree Map snapshots of Figures 6 and 7 (see color plates) are the full color, machine generated screen snapshots of Figures 4 and 5. All screen snapshots in this paper have been made while using our TreeViz application on an Apple Macintosh II.

Figure 8 (see color plates) is a screen snapshot showing a Tree-Map of 1000 files. A simple color mapping has been used to code some of the various Macintosh file types: Tree-Map applications are red; all other applications are purple; system files are green; picture files are magenta; text files are yellow; archive files are cyan; and all other file types not currently of primary interest are gray. This Tree-Map shows 21 root level files on the left, followed by 19 root level directories moving across to the right. Detailed file information is displayed in a pop-up dialog window as the mouse is dragged over files in the display.
 
 




In this directory structure it can be observed that purple application files are generally the largest files on this disk, and take up relatively the same percentage of overall disk space as system related (green) files. A duplicate set of files exists just to the right of the vertical green bar. The files in this root level folder can be seen duplicated one level down in subfolders, as repeating geometric patterns offset 900 from their parent.

Since this Tree-Map portrays the overall allocation of disk space, the largest files can be located quite easily. Sorting a large directory listing by size would also make finding the largest files easy, but these files would not be presented in their original context. In addition, sorting a list on two or more properties (i.e. size and type) makes presentation of the results difficult. Tree-Maps make finding the largest system, application, and picture files on the disk as easy as finding the largest green, purple, and magenta rectangles in Figure 8. This is one simple example of the visual display properties possible; further discussion is contained in section 4.2.

4 The Tree Map Method

Displaying a directory tree while fully utilizing space and conveying structural information in a visually appealing and low cognitive load manner is a difficult task, as these are often opposing goals. Our interactive approach to drawing directory trees allows users to determine how the tree is displayed. This control is essential, as it allows users to set display properties (colors, borders, etc.) maximizing the utility of the drawing based on their particular task.

4.1 Structural Information: Partitioning the Display Space

Tree-Map displays look similar to the partition diagrams of quad-trees and k-D trees. The key difference is the direction of the transformation. Quad-trees create hierarchical structures to store 2-D images efficiently [18] while Tree-Maps present hierarchical information structures efficiently on 2-D display surfaces.

Tree-Maps require that a weight be assigned to each node, this weight is used to determine the size of a nodes bounding box. The weight may represent a single domain property (such as disk usage or file age for a directory tree), or a combination of domain properties (subject to Property 4 below). A nodes weight (bounding box) determines its display size and can be thought of as a measure of importance or degree of interest[9].

The following relationships between the structure of the hierarchy and the structure of its Tree-Map drawing always hold:

Properties

1) If Node1 is an ancestor of Node2, then the bounding box of Node1 completely encloses, or is equal to, the bounding box of Node2.

2) The bounding boxes of two nodes intersect iff one node is an ancestor of the other.

3) Nodes occupy a display area strictly proportional to their weight.

4) The weight of a node is greater than or equal to the sum of the weights of its children.

Structural information in Tree-Maps is implicitly presented, although it may also be explicitly indicated by nesting child nodes within their parent. Nesting provides for the direct selection of all nodes, both internal and leaf. Although the space required for nesting reduces the number of nodes which can be drawn in a given display space, and hence reduces the size of the trees that can be adequately displayed compared to non-nested drawings [21].

A non-nested display explicitly provides direct selection only for leaf nodes, but a pop-up display can provide path information as well as further selection facilities. Non-nested presentations cannot depict internal nodes in degenerate linear sub-paths, as the bounding boxes of the internal nodes in the sub-path may be exactly equal. Such paths seldom occur and tasks dependent on long chains of single child nodes will require special treatments.

4.2 Content Information: Mapping Content to the Display

Once the bounding box of a node is set, a variety of display properties determine how the node is drawn within it. Visual display properties such as color (hue, saturation, brightness), texture, shape, border, blinking, etc. are of primary interest, but the interface will not limit users to purely visual properties [6]. Color is the most important of these visual display properties, and it can be an important aid to fast and accurate decision making [11,15,16]. Auditory properties may also be useful in certain circumstances. Nodes may have many domain dependent properties, in which case a rich set of mappings exists between content information and display properties.

The drawing of individual nodes within their bounding boxes determines the content information statically presented in a Tree-Map. The number and variety of domain properties that can be statically coded in the drawing of the tree is limited. As Kuhn states, "Since human perception imposes an upper bound on the complexity of graphic representations, only a small number of relations can be shown."[7,14] Interactive control of the drawing is therefore critical because the mapping of content information to the display will vary depending on the information users require. Dynamic feedback is provided by a pop-up window which displays information about the node currently under the cursor.

For example, files could have weights (display size) proportional to their creation date, color saturation dependent on their last modification date, and pitch (tone heard while crossing border) based on size. Using this scheme it is easy to locate old files which have changed recently, and as the cursor crosses into their bounding box a deep tone tells users that the file is large even before they read the information about that file.

5 Algorithms

Algorithms are given to draw a Tree-Map and to track cursor movement in the tree. The algorithms may be applied to any tree, regardless of its branching degree. Both algorithms appear on the following page as Figures 9 and 10.

The basic drawing algorithm produces a series of nested boxes representing the structure of the tree.

The cursor tracking algorithm facilitates interactive feedback about the tree. Every point in the drawing corresponds to a node in the tree. While the current tracking point (from a mouse or touchscreen input device) is in a node, the node is selected and information about it is displayed.

5.1 Drawing Algorithm

The Tree-Map can be drawn during one pre-order pass through the tree in O(n) time, assuming that node properties (weight, name, etc.) have previously been computed or assigned. The current algorithm has been implemented in object-oriented Think C on a Macintosh II. The drawing algorithm proceeds as follows:

1) The node draws itself within its rectangular bounds according to its display properties (weight, color, borders, etc.).

2) The node sets new bounds and drawing properties for each of its children, and recursively sends each child a drawing command. The bounds of a node's children form either a vertical or horizontal partitioning of the display space allocated to the node.

5.2 Tracking Algorithm

The path from the root of the tree to the node associated with a given point in the display can be found in time proportional to the depth of the node.

In our implementation, when a node draws itself it stores its bounding box in an instance variable. Every point in the Tree-Map corresponds to a node in the hierarchy, in addition every node is contained in the bounding box of the root node. Recall that each node’s bounding box completely encloses the bounding boxes of its children, and that the bounding boxes of sibling nodes never overlap. Finding the path to a node containing a given point thus involves only a simple descent through one path in the tree, until the smallest enclosing bounding box is found.

6 Coping with Size

A typical 13 inch display has a resolution of 640 x 480, or roughly 300,000 pixels. Drawing an 80mb directory tree (weight = disk usage) on such a display requires that each pixel represent 260 bytes, i.e., there are roughly 4 pixels per Kilobyte. Assuming that such a directory structure may contain roughly 3,000 files (as on one of our lab's hard disks) implies that there are approximately 100 pixels per file on average. A box with 10 pixels per side (roughly 4mm2) is easily selectable using a standard mouse or touchscreen device [19]. This average case analysis is only part of the story since file sizes may vary widely.

The range of file sizes on our hard disk varied from a few hundred bytes to well over one million bytes. In the Tree-Map of Figure 8, groups of very small files often become completely black regions as there is only enough space to draw their borders. Magnification over these regions or zooming can provide access to these files. But since the assignment of node weights can be user controlled, presumably the nodes with the greatest weights are of greatest interest and the nodes with the smallest weights are of least interest.
 
 

7 Future Research Directions

Further research includes the exploration of alternate structural partitioning schemes, appropriate visual display of both numeric and non-numeric content information, dynamic views such as animated time slices, and operations on elements of the hierarchy. Standard operations such as zooming, marking, selecting and searching also invite designers to explore variations on the Tree-Map strategy.

Dr. Ram Narrate-Singh, a visiting research scientist in our lab, is working on an alternate directory only approach to partitioning the display which we have termed "top-down". His implementation on a Sun SPARCstation preserves the traditional notion of having the root node at the top and the leaves at the bottom.

Animation, or time-sliced displays, could provide insight into evolving structures. For example, the hierarchical organization of a university could be mapped from the university level (root), to the college level, to the department level, to the research lab level. If weights were assigned based on personnel resources, it would be easy to see the structure of the university based on the distribution of employees, and hence understand its strengths and weaknesses. Furthermore, if the saturation of red was proportionate to the funds spent at each node, and the saturation of cyan (the inverse of red) was proportionate to the funds allocated, nodes (labs, departments, colleges) which were on budget would be shades of gray (equal amounts of red and cyan), nodes over budget would become increasingly red, and nodes under budget would become increasingly cyan. The magnitude of the nodes funding would range from black (small budgets and expenditures) to white (large budgets and expenditures). If a series of these displays are generated based on data over the last ten years, it would be possible to see how funding and personnel resources have evolved and been distributed within the university.

The range and variety of potential applications of this technology is vast. For instance, stock market portfolios are often hierarchically structured, animations over time of financial portfolios could be a valuable application of this technology.

8 Conclusion

We believe that space-filling approaches to the visualization of hierarchical information structures have great potential. The drawing algorithm we have given is quite general, and the numerous possibilities for mapping information about individual nodes to the display are appealing. The Tree-Map approach to visualizing hierarchical structures enables meaningful drawings of large hierarchies in a limited space.

Acknowledgments

We would like to acknowledge the support of the members of the Human-Computer Interaction Lab, whose suggestions and criticisms have been greatly appreciated. They have forced us to prove the value of Tree-Maps and allowed us to hone our presentations of the idea.

References

[1] David V. Beard and John Q. Walker II. Navigational techniques to improve the display of large two-dimensional spaces. Behavior & Information Technology, 9(6):451-466, 1990.

[2] A. Brüggemann-Klein and D. Wood. Drawing trees nicely with tex. Electronic Publishing, 2(2):101-115, July 1989.

[3] Stuart K. Card, George G. Robertson, and Jock D. Mackinlay. The information visualizer, an information workspace. In Proceedings of ACM CHIÕ91 Conference on Human Factors in Computing Systems, Information Visualization, pages 181-188. 1991.

[4] Richard Chimera, Kay Wolman, Sharon Mark, and Ben Shneiderman. Evaluation of three interfaces for browsing hierarchical tables of contents. Technical Report CAR-TR-539, CS-TR-2620, University of Maryland, College Park, February 1991.

[5] Donna J. Cox. The art of scientific visualization. Academic Computing, page 20, March 1990.

[6] Chen Ding and Prabhaker Mateti. A framework for the automated drawing of data structure diagrams. IEEE Transactions on Software Engineering, 16(5):543-557, May 1990.

[7] Richard Ellson. Visualization at work. Academic Computing, page 26, March 1990.

[8] Steven Feiner. Seeing the forest for the trees: Hierarchical display of hypertext structures. In ACM Proc. COIS88 (Conf. on Office Information Systems), pages 205-212, Palo Alto, CA, March 1988.

[9] George W. Furnas. Generalized fisheye views. In Proceedings of ACM CHIÕ86 Conference on Human Factors in Computing Systems, Visualizing Complex Information Spaces, pages 16-23. 1986.

[10] Tyson R. Henry and Scott E. Hudson. Viewing large graphs. Technical Report 90-13, University of Arizona, May 1990.

[11] Ellen D. Hoadley. Investigating the effects of color. Communications of the ACM, 33(2):120-139, February 1990.

[12] Tomihisa Kamada. On Visualization of Abstract Objects and Relations. Ph.D. thesis, University of Tokyo, Department of Information Science, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113 JAPAN, December 1988.

[13] Donald E. Knuth. Fundamental Algorithms, volume 1 of the Art of Computer Programming. Addison-Wesley, Reading, MA, 2nd edition, 1973.

[14] Werner Kuhn. Editing spatial relations. In Proceedings of the 4th International Symposium on Spatial Data Handling, pages 423-432, Zurich, Switzerland, 1990.

[15] Lindsay W. MacDonald. Using colour effectively in displays for computer-human interface. DISPLAYS, pages 129-142, July 1990.

[16] John F. Rice. Ten rules for color coding. Information Display, 7(3):12-14, March 1991.

[17] George G. Robertson, Jock D. Mackinlay, and Stuart K. Card. Cone trees: Animated 3d visualizations of hierarchical information. In Proceedings of ACM CHIÕ91 Conference on Human Factors in Computing Systems, Information Visualization, pages 189-194. 1991.

[18] Hanan Samet. Design and Analysis of Spatial Data Structures. Addison-Wesley Publishing Co., Reading, MA, 1989.

[19] Andrew Sears and Ben Shneiderman. High precision touchscreens: Design strategies and comparisons with a mouse. International Journal of Man-Machine Studies, 34(4):593-613, April 1991.

[20] Ben Shneiderman. Tree visualization with tree-maps: A 2-d space-filling appoach. Technical Report CAR-TR-548, CS-TR-2645, University of Maryland, College Park, September 1990. to appear in ACM Transactions on Graphics.

[21] Michael Travers. A visual representation for knowledge structures. In ACM Hypertext’89 Proceedings, Implementations and Interfaces, pages 147-158. 1989.

[22] E. R. Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT, 1983.

[23] Kim J. Vicente, Brian C. Hayes, and Robert C. Williges. Assaying and isolating individual differences in searching a hierarchical file system. Human Factors, 29(3):349-359, 1987.