CAR-TR-655
CS-TR-3022
SRC-TR-93-3
Sept. 1993
Revised Jan. 1994
The purpose of computing is insight, not numbers.
---Richard Hamming, 1962
Keywords: dynamic queries,
database search, information retrieval, direct manipulation, user
interface, human-computer interaction, visual languages, visual
information seeking
Abstract
Dynamic queries are a novel approach to information
seeking that may enable users to cope with information overload.
They allow users to see an overview of the database, rapidly (100
msec updates) explore and conveniently filter out unwanted information.
Users fly through information spaces by incrementally adjusting
a query (with sliders, buttons, and other filters) while continuously
viewing the changing results. Dynamic queries on the chemical
table of elements, computer directories, and a real estate database
were built and tested in three separate exploratory experiments.
These results show statistically significant performance improvements
and user enthusiasm more commonly seen with video games. Widespread
application seems possible but research issues remain in database
and display algorithms, and user interface design. Challenges
include methods for rapidly displaying and changing many points,
colors, and areas; multi-dimensional pointing; incorporation of
sound and visual display techniques that increase user comprehension;
and integration with existing database systems.
1. Introduction
Some innovations restructure the way people think
and work. Our experiences with dynamic queries interfaces suggest
that they offer a dramatic change from existing methods for querying
databases. While languages such as SQL have become standardized
and form fill-in interfaces have become widespread, now dynamic
queries empower users to perform far more complex searches by
applying visual information seeking strategies. As users adjust
sliders or select buttons, the results are continuously updated
within 100 milliseconds, enabling them to answer simple fact questions
and find patterns or exceptions.
Dynamic queries are an application of the direct manipulation principles in the database environment (Shneiderman, 1983). They depend on presenting a visual overview, powerful filtering tools, continuous visual display of information, pointing rather than typing, and rapid, incremental, and reversible control of the query.
Definition: dynamic queries describes the interactive
user control of visual query parameters that generate a rapid
(100 msec update) animated visual display of database search results
.
1.1 Dynamic queries examples
The enthusiasm of users for dynamic queries appears
to emanate from the sense of control they gain over the database.
They quickly perceive patterns in the data, fly through the data
by adjusting sliders, and generate new queries in 100 msecs based
on what they discover through incidental learning. By contrast,
most database queries are specified by typing a command in keyword-oriented
language such as SQL, DIALOG, or FOCUS and the result is a tabular
list of tuples containing alphanumeric fields. This traditional
approach is appropriate in many problem solving tasks, but formulating
queries by direct manipulation and displaying the results graphically
has advantages in many situations.
For novices, learning to formulate queries in a command
language may take several hours and then they must deal with the
high level of errors in syntax and semantics (Welty, 1985; Borgman,
1986). Many projects have demonstrated that visual information
seeking methods can be helpful in formulating queries (Michard,
1982; Wong & Kuo, 1982; Kim, Korth & Silberschatz, 1988)
and graphical results in context, such as on a map (Egenhofer,
1990) or a wall (Robertson, Card & Mackinlay, 1993), aid comprehension.
For experts, the benefits of visual interfaces may
be still greater since they will be able to formulate more complex
queries and interpret intricate results. For example, air traffic
control could hardly be imagined without graphical situation displays
(looking down on all the planes in a sector). Network management
is an emerging application for which visual displays are a benefit
because of the extreme complexity in dealing with 10,000 or more
nodes and links (Consens, Cruz & Mendelzon, 1992). Similarly,
statisticians, demographers, or sociologists dealing with large
multi-dimensional databases will be able to explore and discover
relationships more easily using dynamic queries (Becker &
Cleveland, 1987).
Geographic applications emerge naturally as candidates
for dynamic queries. We built a system for real estate brokers
and their clients that allowed them to locate homes by adjusting
sliders for the price, number of bedrooms, distance from work,
etc. (Williamson & Shneiderman, 1992) (Figure 1). Each of
the 1100 homes satisfying the query appeared as a point of light
on a Washington, DC map. Users could explore the database to find
neighborhoods with high or low prices by moving a
Figure 1: The DC Homefinder dynamic query system
enables users to adjust the sliders for location, cost, number
of bedrooms, home type (house, townhouse, or condominium), and
features (Garage, Fireplace, Central Air Conditioning, or New
construction). The results are shown as points of light which
can be selected to generate a detailed description at the bottom
of the screen.
slider and watching where the points of light appeared.
Geographic queries were supported by allowing users to mark the
locations where they and their spouse work. Then users could adjust
the sliders on distance to the work places to give intersecting
circles of acceptable homes. An empirical study was conducted
with 18 psychology undergraduates to compare the dynamic queries
approach to a natural language query facility (Q&A from Symantec),
and a 10-page paper listing. The counterbalanced within-subjects
design found statistically significant speed advantages for the
dynamic queries over either alternative for the three most difficult
of the five tasks (Figure 2). Subjective satisfaction dramatically
favored the dynamic queries. One subject remarked about the dynamic
queries treatment: "I don"t want to stop, this is fun!"
Figure 2: Means and standard deviations for response
times of 18 subjects to five queries with paper, natural language,
and dynamic queries interfaces. The results show an advantage
for the dynamic queries interface as query complexity increased
(Williamson & Shneiderman, 1992).
Another geographic application we built highlighted
entire states of the United States that had cancer rates above
a specified value (Plaisant, 1993) (Figure 3). Users could explore
the database by looking at different years, or adjusting sliders
to select statistics by per capita income, percent college educated,
percent smokers, etc. The rapid change in colors, accomplished
with color indexing on the palette, enabled users to detect changes
in cancer rates over time and correlations with demographic variables.
An extended system built in our lab is being distributed to statisticians
by the National Center for Health Statistics.
Figure 3: This dynamic query shows cervical cancer
rates from 1950 to 1970 in each state. Adjustments can be made
to the year and state demographic variables such as the percentage
of college education, per capita income, and percent smokers.
Applications seem abundant for traditional services
that have geographic aspects (travel information systems, hotel
or resort selection, choosing a college) as well as scientific
or engineering applications (electronic circuits, networks, satellite
ground coverage, astronomical star guides). Another likely candidate
for output is a calendar or time line in which queries might specify
event types (concerts, meetings, conferences) selected by cost,
priority rankings, or distance from home.
A dynamic queries tool for the chemical table of
elements was built with sliders for atomic mass, atomic number,
atomic radius, ionic radius, ionization energy, and electronegativity
(Ahlberg, Williamson & Shneiderman, 1992) (Figure 4). Appropriate
chemicals are highlighted and students can refine their intuitions
about the relationships among these properties and the atomic
number or position in the table. The dynamic queries approach
to the chemical table of elements was tested in an empirical comparison
with a form fill-in query interface (FG) and a textual output
interface (FT). The counterbalanced ordering within-subjects design
with 18 chemistry students showed strong advantages for the dynamic
queries in terms of faster performance and lower error rates (Figure
5).
Figure 4: The chemical table of elements makes a
natural visual display for information on chemical properties.
Chemical matching the query are shown in red. Gaps and jumps are
easily found.
Figure 5: Means response times for 18 subjects performing
five tasks with the dynamic queries, form fill-in plus graphic
output (FG), and form fill-in with tabular output (FT). The results
strongly favor the dynamic queries interface (Ahlberg, Williamson
& Shneiderman, 1992).
When there are no natural graphical displays for
the output, dynamic queries can still be implemented with result
sets shown in a traditional alphanumeric tabular display (Figure
6). In our implementation the sliders and buttons are created
semi-automatically by the program depending on the values that
exist in the imported ASCII database. As the users press down
on the mouse and adjust the sliders, the result bar on the bottom
changes dynamically to indicate how many items remain in the result
set, but the tabular display is updated only when the user releases
the mouse button. This policy was adopted to avoid the distraction
of the frequent rewriting of the display.
Figure 6: Even when there is no natural graphic framework
for a dynamic query display, the method can be used with tabular
alphanumeric output. As users adjust the sliders and buttons for
the query, the result bar along the bottom indicates how many
items match. When the users stop moving the sliders and let go
of the mouse button, the tabular display is re-written.
Another tabular dynamic query was built to allow users to explore UNIX directories (Liao, Osada & Shneiderman, 1992) (Figure 7). Sliders for size (in kilobytes) and age (in days) of files enabled 18 users to answer ten questions such as "How many files are younger than umcp_tai?" The three versions of the program explored approaches to showing the results in standard "ls -l" format:
highlighting matches with color, highlighting matches
with asterisks on the same line, and displaying only the matching
lines (that is, delete the non-matching files from the display).
The latter approach, called Expand/contract, was distracting if
updates were made as the slider was being moved, so re-displays
occurred when users stopped moving the sliders and let go of the
mouse button. In five of the tasks there was a statistically significant
speed advantage for the Expand/contract interface. This result
occurred only with medium sized directories of approximately 60
entries (two screen pages), and not with a smaller one screen
page directory. The benefits of Expand/contract seem likely to
grow as the directory size increases. These results help us develop
guidelines and theories about how to design displays for dynamic
queries.
Figure 7: The standard "ls -l" tabular
display can be the framework for UNIX directory exploration. The
sliders, built with Sun DevGuide, allow selections to be made
on the age (in days) and size (in kilobytes) of files. Color highlighting
and expand/contract methods of display were compared in an exploratory
study.
1.2 Advantages
The advantage of dynamic queries seems to be that
it allows users to rapidly, safely, and even playfully explore
a database. Users may be able to discover quickly which sections
of the multi-dimensional search space are densely and sparsely
populated, where there are clusters, exceptions, gaps, or outliers,
and what trends there are in ordinal data. Such overviews, the
ability to explore, and the capacity to rapidly specify known
item queries makes dynamic queries an appealing approach for certain
problems. The advantages of rapid reformulation of queries was
hinted at in the early work on textual interfaces in Rabbit (Williams,
1984). The advantages of visual input and output was explored
in some statistical display programs (Buja, McDonald, Michalak,
& Stuetzle, 1991). Dynamic queries achieve these advantages
by applying direct manipulation strategies (Shneiderman 1992):
- Visual presentation of query components
- Visual presentation of results
- Rapid, incremental and reversible actions
- Selection by pointing (not typing)
- Immediate and continuous feedback
For data in which there is a known relationship among
variables, the dynamic queries interface is useful for training
and education by exploration. For situations in which there are
understood correlations, but their complexity makes it difficult
for non-experts to follow, dynamic queries can allow a wider range
of people to explore the interactions (health and demographic
variables, chemical table of elements, economic or market data).
Finally, where there is so much data that even experts have not
sorted out the correlations, dynamic queries may help users to
discover patterns, form and test hypotheses, identify exceptions,
segment data, or prepare figures for reports (studying clinical
trial data, picking stocks, finding a home).
1.3 Disadvantages
The disadvantages of dynamic queries stem largely
from their poor match with current hardware and software systems.
First, the requirement for rapid performance in search algorithms
and display strategies cannot be easily satisfied with current
database management tools. Therefore, we are exploring which data
structures and algorithms accommodate large datasets and permit
rapid access (Jain & Shneiderman, 1993). Rapid graphical display
methods are especially useful for dynamic queries, but these are
not widely available.
Second, application specific programming is needed
to take the best advantage of dynamic query methods. While we
have developed some standardized tools, they still require conversion
of data and possibly some programming. Standardized input and
output plus software toolkits would make dynamic queries easier
to integrate into existing database and information systems. Alternatively,
dynamic queries could be generated by User Interface Builders
or User Interface Management Systems.
Third, our current dynamic queries implement only
simple queries that are conjunctions of disjunctions, plus range
queries on numeric values. Our filter/flow metaphor offers one
approach to providing full boolean functionality (see Figure 9),
but this prototype must still be refined and implemented within
database management systems (Young & Shneiderman, 1993). More
elaborate queries (group by, set matching, universal quantification,
transitive closure, string matching) are contemplated but these
are still research and development problems.
Fourth, visually handicapped and blind users will
have a more difficult time with our widgets and outputs, but we
feel that we could accommodate these users as well, and we are
exploring audio feedback.
2. Research Directions
"Visualization is a method of computing. It
transforms the symbolic into the geometric, enabling researchers
to observe their simulations and computations. Visualization offers
a method for seeing the unseen. It enriches the process of scientific
discovery and fosters profound and unexpected insights. In many
fields it is already revolutionizing the way scientists do science."
(McCormick et al., 1987)
While our initial implementations of dynamic queries
have generated enthusiasm, we are more aware of challenges than
successes. Research opportunities appear rich in database and
display algorithms and user interface design.
2.1 Database and display algorithms
Since rapid update of the display is essential, existing
algorithms to store and retrieve multi-dimensional information
need refinement (Bentley & Friedman, 1979; Samet, 1989). For
small databases that fit in main memory, we experimented with
array indexing, grid structures, quad trees, and k-d trees. Linear
array structures with pointers were effective with small databases,
but their inefficient use of storage limited the size of the databases
that could be handled. With uniform distributions grid file structures
were efficient while the quad and k-d trees became more attractive
as the distributions became more skewed (Jain & Shneiderman,
1993)
For larger, disk-based databases, alternatives include
R-trees (Guttman, 1984, Beckmann, 1990), grid files (Nievergelt,
1984) multiple B-trees and reduced combined indices (Lum, 1970;
Shneiderman, 1977). When inserts and deletes to the stored information
are treated separately, the design of efficient data structures
is simplified.
In dynamic queries, there is always a current query
result on display and each new query is a slightly enlarged or
contracted version of the current query. It seems that special
data structures and algorithms might allow rapid update of the
results. The innovations here stem from the fact that structures
must be kept largely in high-speed storage to ensure that the
rapid performance demands are met. We believe that effective strategies
will stem from the organization of data along each dimension in
buckets adjusted to the granularity of the slider mechanism. For
example, if the slider has 100 positions for a field whose range
is 1 to 50,000, then the data should be organized into 100 buckets
each covering 500 points on the field. Then as the slider is moved
to increase the selected set, the buckets can be appended to the
selected set. As the slider is moved to decrease the selected
set, the buckets can be removed. With three dimensions of 100
buckets each, the database is conveniently broken into 1,000,000
buckets which can be stored and retrieved efficiently.
Data compression methods are important to allow larger
databases to fit in 32 megabyte or smaller address spaces. Alternatively,
parallel hardware and algorithms that search multiple storage
spaces may be needed.
Screen management algorithms also play an important
role. When a slider is moved or a button depressed, instead of
repainting the entire display it is often more effective to merely
repaint the areas or points that have changed. Our early efforts
suggest that in some cases manipulation of the palette by color
indexing may be effective in providing rapid changes for irregularly
shaped regions, even on popular personal computers (Plaisant,
1992). New classes of algorithms for screen management are anticipated
as an alternative to more expensive hardware.
2.2 User-Interface Design
As humans we have the ability to recognize the spatial
configuration of elements in a picture and notice the relationships
among elements quickly (Tufte, 1983; Foley et al., 1990). This
highly developed visual system allows people to grasp the content
of a picture much faster than they can scan and understand text.
By shifting some of the cognitive load of information retrieval
to the perceptual system designers can capitalize on a well developed
human visual processing capability. Appropriate static coding
of properties, by size, position, shape, and color, can greatly
reduce the need for explicit selection, sorting, and scanning
operations. However, our understanding of when and how to apply
these methods is poor, and basic research is needed.
Graphical display properties such as color (hue,
saturation, brightness), texture, shape, border, blinking, etc.
are of primary interest. Color is the most effective of these
visual display properties, and it can be an important aid to fast
and accurate decision making (Ding & Mateti, 1990; MacDonald,
1990; Marcus, 1992). Auditory properties may be useful in certain
circumstances (e.g. lower frequency sounds would be associated
with large values and higher frequency sounds with small values),
especially as redundant reinforcement feedback (Blattner, Greenberg
& Kamegai, 1992).
Basic research on color, sound, size and shape coding, etc. needs to be reexamined in the context of dynamic queries. Rapid and smooth screen changes are understood to be essential for perception of patterns, but we would like to develop more precise requirements to guide designers.
In our experience, delays of more than 2-3 tenths
of a second are distracting, but precise requirements with a range
of situations and users would be helpful.
Although our initial results are encouraging, there
are many unanswered user interface design questions that need
exploration, for example how to:
- design widgets to specify multiple ranges of values, such as between 14 and 16 or between 21 and 25,
- enable users to express boolean combinations of slider settings,
- choose among highlighting by color, points or light, regions, blinking , etc.,
- allow varying degrees of intensity in the visual feedback,
- cope with thousands of points or areas by zooming,
- permit weighting of criteria,
- select a set of sliders from a large set of attributes,
- provide "grand tours" to automatically view all dimensions,
- include sound as redundant or unique coding of one dimension, and
- support multi-dimensional input.
Other issues emerge when no natural two-dimensional
representation of the data can be identified. Of course a textual
representation can always be used (e.g. the list of items highlighted
with color, inverse video, font type, and size) and we explored
such representations for the dynamic query of directory listings
(Figure 7). A natural possibility is to create a two dimensional
space such as a scattergram. Instead of showing homes as points
of light on a city map, they could be points of light on a graph
whose axes were the age of the house and its price. Sliders could
still be used for number of bedrooms, quality of schools, real
estate taxes, etc.
Treemaps are another mechanism for visualizing large
amounts of hierarchical information on which dynamic queries could
be applied (Johnson and Shneiderman, 1991). For example we built
a business application allowing the visualization of sales data
for a complete product hierarchy, color-coded by profitability
and size-coded by revenue. Twelve professional users in our usability
study could rapidly determine the state of financial affairs --
large red regions indicate trouble and blue areas signal success.
A slider allowed users to observe quickly the changes to the treemap
over time and find trends or spot problems.
A central issue is the design of appropriate widgets.
Even in our early explorations we were surprised that none of
the existing user interface management systems contained a double-boxed
slider that would permit specification of a range (e.g. cost of
a home is required to be more than $70,000 and less than $130,000).
In creating such a slider we discovered how many design decisions
and possibilities there were. In addition to dragging the boxes,
we had to contend with jumps, limits, display of current values,
what to do when the boxes were pushed against each other, choice
of colors, possible use of sound, etc.
On the input side we realize that existing widgets
are poorly matched with the needs of expert users who are comfortable
with multi-dimensional browsing. Two-dimensional input widgets
to select two values at once are not part of any standard widget
set that we have reviewed, necessitating their creation (Figure
8). The benefits of a single widget are that only one selection
is required to set two values and that correct selections can
be guaranteed (the dotted areas indicate impossible selections,
for example, the cheapest 7 bedroom house is $310,000).
Figure 8: Two prototype two-dimensional widgets.
The top one specifies a point indicating the number of bedrooms
(3) and cost of a home ($300,000) with a single selection. The
bottom one specifies a range of bedrooms (3 to 4) and cost ($180,000
to $320,000).
Three and higher dimensional input widgets may facilitate
exploration of complex relationships. Current approaches for high
dimensional input and feedback are clumsy, but research with novel
devices such as data gloves (Feiner and Beshers, 1990), Polhemus
devices, the SpaceBall, or various 3-D mice may uncover effective
methods. With a 3-D mouse users lift the mouse off the desk and
move it like a child playing with a toy airplane. The mouse system
continuously outputs the six parameters (six degrees of freedom
- 6DOF) that define its linear and angular position with respect
to a fixed coordinate system in space.
Designers can always decompose the rotation motion
of the mouse into the combination of (1) a rotation around the
handle of the mouse, (2) a change in the direction where this
handle is pointing. When the mouse is held as a pointer, the rotation
around the handle is created by a twist of the arm, and it may
be natural to users to make the same twisting motion to increase
the level of a database parameter as they would to increase the
volume of a car radio. Changing the pointing direction of the
mouse handle is done by the same wrist flexion that a lecturer
would use to change the orientation of a laser pointer to point
at another part of the conference screen. It may then also feel
natural to users to imagine the planar space of two database parameters
as vertical in front of them and point at specific parts by flexing
their wrist up, down and sideways.
For example, sophisticated users could perform a
dynamic query of the periodic table of elements using the 3-D
mouse. They would find elements of larger atomic mass by translating
the mouse upward; for larger atomic numbers they would move to
the right; for larger ionization energies they would move toward
the display; for larger atomic radius they would bend their wrist
up; for larger ionic radius they would bend their wrist to the
right; for larger electronegativity they would twist their arm
clockwise. Sliders should probably still be present on the screen,
but would move by themselves and give feedback on parameter values.
Another input issue is the ways of specifying alphanumeric
fields. While a simple type-in dialog box is possible, more fluid
ways of roaming through the range of values is helpful. To this
end we began to develop an "alphaslider" to allow users
to quickly sweep through a set of items that might be the days
of the week or the 6000 names of actors in a database of movies
(Ahlberg & Shneiderman, 1994).
On the display side many questions have been opened
up by our initial efforts. Sometimes points on a map are a natural
choice, but non-overlapping areas and overlapping areas seem useful
in some applications. Points and areas can be on or off (in which
case monochrome displays may be adequate), but we believe that
color coding may allow more information to be displayed. Texture
and shape coding, plus sound are also appealing directions.
To cope with the larger problem of specifying complex
boolean combinations of attribute values we have developed the
filter/flow model (Young & Shneiderman, 1993). Figure 7 shows
how it might be applied to help students choosing colleges. Users
can select from the set of attributes and get an appropriate filter
widget (type-in for interest areas, sliders for cost, and buttons
for scholarships) which is placed on the screen with flow lines
showing ANDs (sequential flow) and ORs (parallel flows). The X
in each filter widget could be selected to negate the filter values.
Clustering of one-in-one-out segments to form a new and save-able
filter is possible. This approach was shown to be statistically
significantly more effective than SQL for composing and comprehending
queries.
Figure 9: A mockup of a filter/flow boolean query
( (Interests = English or Literature or Journalism) AND ((Tuition
greater than or equal to $2200 or less than or equal to $4500)
OR ((Tuition greater than or equal to $5100) AND (Scholarships
are available by Work-Study or Assistantship))) ) combined with
map output to show the result (Dartmouth, Grinnell, and the Univ.
of Maryland).
3. Summary
Dynamic queries offer a lively new direction for
database query. Many problems that are difficult to deal with
using a keyword-oriented command language become tractable with
dynamic queries. Contemporary computers have become fast enough
that this direct manipulation approach can be applied for modest-sized
problems and still ensure an update time of under 100 msec. The
challenge now is to broaden the spectrum of applications by improved
user interface design and by fast database search plus compression
methods.
Dynamic queries can become a general approach that
is attached to every database system, spreadsheet, and many stand-alone
applications. The benefits accrue both to novice and expert users.
Research directions include: (1) database and display algorithms,
and (2) user interface design.
Acknowledgements:
Thanks for the creative implementation efforts of Christopher Ahlberg, Christopher Williamson, Holmes Liao, Boon-Teck Kuah, and Vinit Jain for ta