Bill Buxton
Alias | Wavefront Inc. & University of Toronto
Toronto, Canada
We argue that the current notions of television, interactive television and personal computers are inadequate to support the potential offered by the ongoing convergence of media. To reap the potential that is there, we must rethink the nature of not only the appliances with which we interact, but where they are located.
What follows is an exploration of the technological appliances in our lives and our
perceptions of them. We argue that these perceptions are largely shaped by the parts of
these appliances we see, touch and hear. Our discussion is framed around the appliances
that are the keys to "Multimedia" and the "Information Highway." That
is, things like computers, televisions and telephones.
Our concerns are the increasing complexity of such appliances, on the one hand, and
strategies for their redesign, on the other. Given the influence that the physical design
and location of the appliance plays in our perception of them, our main argument is that
the best way to reduce complexity and improve design is by rethinking these very same
things: physical design and location.
So, for example, the question is not, "How do we do interactive television?"
Rather, it is, "How do we redefine what a television is and where it should be?"
It is not, "How do we make a usable multimedia computer?" It is, "How do we
make computers disappear, and where do we want access to computation?"
In short, we argue that the goal should not be to make new applications run in the
existing appliance, or box. The solution to achieving the potential of technological
convergence is to completely redesign the box, or better yet, make it disappear into the
ecology of our living space altogether.
This is neither a case study nor a conventional research paper. Rather, it is a discussion
of issues that have emerged from a larger body of research and simply living with
experimental systems - issues that appear to have relevance in helping guide us from the
conventional appliance model of technology towards something more transparent, but
effective.
Around 1978 I directed a project at the University of Toronto which developed one of
the first all digital interactive systems for music composition, sound design, synthesis,
and performance (Buxton, Sniderman, Reeves, Patel & Baecker, 1979). Essentially, it
had the functional properties of a Macintosh with a graphical score editor, voicing
program, and digital sound synthesizer. Musicians from about ten different countries, with
no computer experience, came to realize compositions with this system. After an hour or
two, they were able to work on their own. The system ran on a PDP-11/45, with some custom
input and output devices, as well as a custom digital synthesizer.
In the same room where this operated, there was a PDP-11/40, that had a card reader, a
tape drive, and D/A converters. Students from the Faculty of Music who were studying
electronic and computer music used this latter system for their course work. They punched
a stack of cards defining their job and brought them to the I/O clerk who would then feed
them into the 11/40. If all went well, the music software would grind out the samples of
the resulting music and write them onto the tape drive. This tape would then be
"played back" through the D/A converters and recorded onto audio tape. Normal
turn around was overnight. (Incidentally, this procedure was typical for computer music at
the time.)
Realizing the discrepancy between these two music systems, I approached the Dean of the
Faculty of Music, who also directed the electronic music program. I offered that his
students would be welcome to use my system for their course work, rather than the batch
system. His response was, "No, because if they use your system they will not be
learning to use a computer, and what they do learn will be of no use when they
graduate."
What this response said to me was that his view of a computer was dominated by the process
of interaction and the input/output devices through which this interaction took place.
This was despite the fact that both systems were running on virtually identical CPUs. Now
this was an intelligent person, literate in technology. Rather than surprising me, his
response simply emphasized the degree to which what he saw, felt and heard shaped his
model of computers and computer music.
There are two exercises that we can undertake to push this point further. With yourself, a class or some other audience, do the following:
Exercise 1: In 15-20 seconds, draw a computer.
Exercise 2: Imagine that it is 1960. Again, in 15-20 seconds, draw a computer.
I have had over 500 people do this. In response to the first exercise, the overwhelming
majority draw a monitor and a keyboard. A smaller but significant number draw a mouse as
well. What is interesting is that relatively few actually draw the box that contains the
CPU, itself. A typical response is illustrated in Figure 1.
Figure 1: A typical 15-second drawing of "a computer."
The responses that I get to exercise two are more varied. (I suspect that this is due
to the likelihood that the majority of respondents were not alive in 1960, much less using
computers.) What people typically draw looks much like a collection of refrigerators and
washing machines, intended to represent the keypunch machines, card readers, tape drives
and line printers of the era. A representative example is shown in Figure 2.
Virtually nobody will draw a schematic of an arithmetic/logic unit (ALU), and/or a set of
data registers interconnected by a data and an address bus. From an engineer's
perspective, this may be the "correct" response. (At least it would more-or-less
lead to the same result in each exercise.)
The consistency of the results are very telling. The key to their significance lies in the
observation that what people draw are the input/output devices of the computer, not the
computer itself. What this emphasizes first is the power of what users see and touch (the
input/output devices) to shape their mental model of the system. Second, it reminds us
that these very same input/output transducers are "accidents of history" and
therefore candidates for change.
Figure 2: A representative 15-second drawing of a circa 1960
"computer."
These are two of the most powerful observations that a designer of computers could have. What they say is:
1. You can change the input/output devices.
2. By your choice, you can have a huge influence on shaping the end user's mental
model of the system.
From these simple exercises emerges the power to fundamentally change the perception of
computation forever. And what is true here for computation, is equally true for
television.
Today, people's concept of computation and television is plagued by a sameness that is
stifling growth toward the true potential that is there. The prevalent "one size fits
all" situation is highlighted by the consistency in the results of Exercise 1. So
let's talk about computers for a moment.
Just as Henry Ford is purported to have said of his automobiles, "you can have it in
any colour you want as long as it is black," so do current computer manufacturers
say, "You can have it in any form you want as long as it has a keyboard, display and
a mouse."
From twenty paces, all desktop computers running a GUI are indistinguishable, regardless
whether they are PC's running Windows, UNIX boxes running Motif, or Macintosh's running
the Finder. Furthermore, portables are simply shrunken-down desktop machines. There are no
fundamental, or first-order, design differences across the bulk of the currently installed
base of systems.
But what of the users? How consistent are their skills or needs? Is the same basic tool
equally suited to the graphic artist, engineer or secretary? The question is rhetorical.
What should be evident is that we cannot get to the next level until we begin to tailor
systems to the specific skills, tasks and contexts of specific users.
Remembering our two drawing exercises, this cannot be reduced to "a simple matter of
programming." It is a need to fundamentally redefine the nature of what a computer
is. (But luckily, from those very same drawing examples, comes the keys to doing so.)
The diversity of our media appliances must be driven by the diversity of locations and
the contexts in which they are found. Systems must be tailored to "absorb"
actions, information and artifacts from the physical world into the electronic domain, and
to "squeeze out" information from the electronic into the physical domain. From
the diversity of contexts and locations must come a diverse set of new appliances.
Think of television. We think of it in terms of the appliance known as the
"television set." (Unfortunately, because of the strong relationship between the
medium and the appliance, the same name applies to both.) In almost every article that I
have read about the next generation of television, the author remains fixated on the
conventional scenario of the TV appliance in the living room of the home. About the only
things that are new are 500 channels and a "set top box." Applications, such as
video on demand and home shopping, are then squeezed into this poor, over-worked and under
capable technology. How much better if we break out of the box. Consider two examples.
Instead of "home shopping," what if interactive television were to bring you
"pump-side shopping?" While you were standing beside your car filling up with
petrol (truly a captive audience), the pump itself would be the "box." By
packaging it in a gas pump and moving it from the living room to the roadside, the entire
perception of the system would change, while the underlying technology would be almost
identical to what might be found in the home. Figure 3, for example, illustrates a
prototype of just such a pump which is being field tested today.
Figure 3: A prototype gasoline pump with integrated touch screen and media
delivery mechanism for "pump-side shopping." (TouchCom Technologies).
For my next example, I want you to think about the recent announcement by companies
such as Sony, NEC, Fujitsu and Matsushita of pending large flat plasma displays for
televisions. A unit available from Fujitsu, for example, is illustrated in Figure 4. So
now that this technology is coming, the box will change. But what does that mean? What can
we learn from this?
Figure 4: A flat-panel display mounted on wall (Fujitsu).
If you believe what I've been saying so far, then you might be asking yourself,
"If the box is changing, and moving from the floor to the wall, won't that change our
perception of the nature of TV?" I believe the answer to be yes. Let's do another
exercise that illustrates why:
Exercise 3 : Write down what is the largest interactive display in your house, and where it is located.
I can hear the pens already, all scratching in unison, "the television." Well,
by now you should know me better than that. I bet that the real answer, on reflection, is
the refrigerator in your kitchen!
If it is anything like mine, illustrated in Figure 5, it is covered with calendars, memos,
appointments, reminders, children's art and all kinds of other paraphernalia. And let's be
clear, it is not just a display. Just notice how dynamic the information on it is.
So I ask you, is not the fridge door one of the most logical places to put one of these
large flat displays? Think about being able to put notes on it with a stylus, interact
with it by touch, and read it or place things on it remotely, since - of course - it will
be on the Internet as well as cable.
Now if you think that this, or anything like this is remotely likely in the next 5-10
years, then you are forced to ask, "How does this affect my conception of what
television is and could be?" "How might it affect what I do in the next 2
years?"
Figure 5: The refrigerator seen as a domestic interactive information appliance.
I am not saying that home shopping, set-top boxes and video on demand will not come. What I am suggesting is that they are such a small part of what the future is bringing that if we spend all of our time focusing our conceptual models on them, we will miss the heart of the gold mine while we are picking up little pieces around the edge.
How functionality and content are packaged and where they are delivered has a huge
impact on their usefulness and our perceptions of them. Therein lies a key problem of the
concentration of new applications on two "super appliances," the television with
set-top box and the multimedia computer.
Figure 6 illustrates this point by analogy. It shows three pocket knives that I own. The
one on the left has a number of gadgets built in, including scissors, a screw driver, and
a cork screw. Because of the resulting size, I don't carry it in my pocket; rather, it
lives in my briefcase. Consequently, I can only make use of it when I have my briefcase
with me.
The one in the middle has far fewer gadgets and therefore, it is much smaller. It fits in
my pocket and so I have it with me nearly all the time. Consequently, despite lower
"functionality" I use it far more often. The left knife is like my portable
computer, which also lives in my briefcase. The middle one is more like my watch, which is
almost always with me.
The third knife, on the right, has the most functionality, but ironically, is the least
useful. In fact, its lack of usefulness is a direct result of overloading too much
functionality into one device. I am going to argue that this is precisely the same problem
that afflicts multimedia computers and overburdened television sets.
As a start, what I do with the knife on the right is ask someone to find the wood saw.
This can take as long as five minutes. This illustrates that as functionality increases,
the harder it is to find what you want, or discover what is possible.
However, there are deeper problems. Notice that, despite all of the different functions
available, only one person can access them at once, and even then, only one function at a
time. Again, the analogy to multimedia computers and interactive television is applicable.
Despite being able, in theory, to support shopping, learning, games, video on demand,
research and telecommunications, only one of these functions can be accessed at a time per
appliance, and there is generally only one controller available.
One could argue that this can be easily solved by having multiple appliances around the
house. After all, don't most houses already have more than one television?
There are two problems with this argument. The first is the assumption that with all of
the new capability, the costs will not be substantially higher, so that consumers will be
able to afford multiple information appliances. The second relates back to the Henry Ford
design issue: there is an assumption that even if we could afford to have multiple
interactive televisions or multimedia computers, that those are the appropriate appliances
for the task.
We can do another exercise to gain some insight into these issues:
Exercise 4: Make a list of two columns. In the left column, list each of the gadgets (knife, saw, corkscrew, etc.) in the third knife in Figure 6. In the second column, list the room in the house where the function associated with that gadget is normally undertaken.
When I do this exercise, the list looks something like this:
Utensil Location
Saw workshop
spoon kitchen
fork kitchen
scissors sewing room
leather punch stable
nail file bathroom
corkscrew dining room
What I want to emphasize here is the relationship between functionality and the location
where it is delivered. What this suggests is the provision of a diverse range of simple
specialized appliancettes where they are needed and in a form appropriate to their use and
user, rather than some Henry Ford style super appliance. To convince yourself of this
argument, just ask, would you not rather have a simple knife, fork, corkscrew and nail
file in the appropriate rooms, rather than carry around the fat knife in Figure 6, or have
knives like that distributed throughout the house? The knife solution is patently absurd.
But isn't that precisely what we are doing with interactive TV and multimedia computers?
Perhaps it's time to rethink our approach.
Figure 6: Three pocket knives with varying amounts of "embedded functionality."
We have argued that perceptions of technologies are dominated by what we see and touch:
primarily, the devices used for entering things into and getting things out of the system.
Current design is hampered by at least two problems. First, the limited repertoire of
input/output devices restricts our ability to provide a seamless bridge between the
objects and artifacts in the everyday physical world, and those in the electronic domain
of information systems. Second, by their "one size fits all" general approach,
current systems are inherently weak.
There is great potential in the emerging technologies. There is also a great risk that we
will miss the chance of actually achieving this potential.
The current models of computers and television are dominated by too narrow a range of
appliances - so much so that we can neither separate the medium from the appliance (for
example, "television"), nor develop any unifying models which tie together the
different pieces of the story (for example, television and the Internet).
We have to push harder, and on a wider field of view. If this essay has helped in bringing
this about, even a little, it has been worthwhile.
The ideas expressed in this essay have developed over the past few years during
conversations with a wide range of people. In the main, they were cultivated during
discussions with colleagues at Xerox PARC, the Ontario Telepresence Project and Alias |
Wavefront Inc.
The research underlying this essay has been generously supported by Xerox PARC, the
Ontario Telepresence Project, Alias | Wavefront, the Information Technology Research
Centre of Ontario and the Natural Sciences and Engineering Research Council of Canada.
Buxton, W., Sniderman, R., Reeves, W., Patel, S. & Baecker, R. (1979). The
Evolution of the SSSP Score Editing Tools. Computer Music Journal 3(4), 14-25.
Reprinted in, , C. Roads & J. Strawn, (Eds.) (1985). Foundations of Computer Music.
MIT Press, Cambridge MA, 376-402.