Voice Input for Computer Learning in the 21st Century

Go to previous version.
go to critical reviews

When the crew communicates with HAL in Arthur Clarke's 2001--A Space Odyssey, they simply talk, and HAL talks back to them. Science fiction writers have long understood that talking is the natural way to interact—with or without computers.

In the real world, speech recognition, originally researched on large expensive computers, primarily in laboratories in Carnegie Mellon University and at IBM, was not considered practical for every day use. Few people were acquainted directly with voice activated computers. Now, however, the computer industry has advanced speech input to the category of "practical." Currently, competing products are available for personal computers at reasonable prices. Dragon, Lernout and Hauspie, and IBM have developed products that accept normal continuous speech, with no delays between words. They have high accuracy, improving as a person uses the product. These systems watch context, so they can distinguish between homophones, such as to, too, two, and there, their, they're. The focus of voice input has been on dictation and on built in speech recognition for existing applications (e.g., Open Word). To date, speech input software is little known and used in education.

A major question arises: If computers can interact via voice, can voice input be used as a learning tool?

The Learning Process, an Interaction

To understand the role of voice input in effective computer learning units, we need to consider the importance of interaction and individualization. A central need in learning is providing highly individualized help to each student from moment to moment, responding directly to the problems of that student. Benjamin Bloom and his students demonstrated, in experiments in the Chicago public schools (starting in1980), that mastery is attainable for almost everyone, with tutors to assist with student problems. But only a few people can afford such tutoring, and the number of skilled tutors is limited, making such learning opportunities uneven.

However, with interaction made possible using carefully programmed computers, we can now offer individualized tutorial aid to every student, at a level impossible to achieve with books or lectures. We can look for learning problems and offer direct immediate assistance, making excellent learning readily available. Such computer-based tutorial learning can in the immediate future be accessible for everyone, through distance learning, not just for a few wealthy individuals or nations. But this possibility is not being realized now.

Learning via the Computer

Learning material on the computer today is less interactive than it was a dozen years ago. One of the problems is the increasing use of the point and click capabilities, an inadequate form of interaction for full communication. Many designers of computer-based learning material are familiar only with the non-interactive textbook and lecture mode of learning. They replicate this form of knowledge-transfer in their computer software.

Most World Wide Web learning sequences also follow a textbook or lecture mode. Video material is almost always in this same non-interactive format, delivered either on or off the Web. With these methods of learning, individualization is seldom attained. We largely ignore the major problems of helping students having difficulties and of keeping them interested in learning. The result is that many students who were already not learning well in classrooms are now continuing to not learn well, this time in computer-programmed conventional education. True mastery of a subject for all students is unattainable with this mode of learning.

We know how to develop highly interactive tutorial learning. Voice input will help.

Typing for Computer Input in Learning

In addition to the computers’ pointing and clicking, there is yet another tool that hampers learning effectiveness. Typing, a crude and old-fashioned technique, with its peculiar QWERTY keyboard, is not an easy way to communicate. The keyboard was originally designed to slow down typing order to prevent jamming on mechanical typewriters. Therefore, frequently used letters like e and t were not placed in the home position.

We waste large amounts of student time teaching typing (now called keyboarding) in schools. Although somewhat replaced by the increasing use of the World Wide Web, keyboarding is still a major form of computer use, with new standards for typing speed for each school grade. For many students, typing is a slow and cumbersome mode of communication, a mechanical intervention between thought and writing. It is most punishing for the typist who is unskilled or who lacks good hand-coordination.

Until recently, typing was the only possibility for student input. At the University of California, Irvine, we began using typing for input in highly interactive tutorial units over thirty years ago. Such projects as the Scientific Reasoning Series and Understanding Spoken Japanese used typing, because no better mode of communication was available allowing full language use.

Voice for Computer Input in Learning

Now that computer speech input is practical, we should substitute it for keyboard input. In highly interactive computer-based tutorial learning units, speech input would be particularly valuable, allowing a friendly conversational interface for offering help with student problems and for keeping students interested in learning. No voice training would be required because of the limited number of inputs sought by the program at each point.

Skilled typists may not want speech input; they can continue to type. But from my experience in the world of learners, I think that few students, young and old alike, are marvelous typists, or even marginal typists. My prediction is that students, even the skilled typists, will quickly shift to computer voice input programs, more natural and more comfortable than typing.

As stated above, we will need much more empirical information comparing these two input modes.

Future Work and Problems

In discussing the recent development of voice input software, it is important to acknowledge our lack of good information. To progress further, more research is essential.

1. More usage is needed with voice input for learning, as with any new and promising technology. This involves making voice input an integral part of a range of learning products, for students at all levels from early childhood to old age, and of many different backgrounds, including cultures and languages. We need careful empirical studies. Little work is going on at present, and therefore there is a dearth of such material.

One of the current undertakings is the project at the Educational Development Corporation in the Boston area. Speak to Write primarily concerns handicapped students. An interesting list server is available.

We began one such study at the University of California, Irvine, as David Britton's thesis work. This work is based on the interactive software, The Scientific Reasoning Series, marketed for many years in keyboard input form by IBM. We are creating two programs, almost identical except that one accepts voice input while the other accepts keyboard input. We will compare these two programs in use with typical students. The software is intended for high school use. Unfortunately this work has been interrupted for financial reasons, but we hope it will soon be back on track. This should be only one of many other similar studies.

2. Special problems in learning may be greatly helped with voice input. Among these are the areas of: very young children, reading and writing, adult literature, and ethical problems.

Very young children, too young to read and type Opinions vary as to the difficulties that will be encountered with voice input for very young children. Available learning units for young children are presently based primarily on pointing and clicking. Yet we know that children, marvelous learners that they are, have actually acquired their language through listening. We should be able, with computer voice input, to capitalize on this learning ability and on the tremendous potential, enthusiasm, and energy of young children.

Reading and writing. Learning to read and write at an early age with tutorial voice input seems to be a particularly attractive possibility, perhaps revolutionizing our current learning methods in these very important areas. Reading and writing might be learned together, as part of the same process. We have literature available.

Adult illiteracy. Adult illiteracy, a global problem deserving serious attention, is another field for tutorial learning material involving voice input. There are one billion illiterate adults in the world, about 2/3 of them women. Adult illiteracy is a major factor leading to poverty and violence. Even in developed countries this is often a problem not being solved by current approaches.

Ethical problems. Tutorial voice activated learning units might play a major role in solving major problems of the world, including ethical problems. Attention is increasingly being directed to the violence in our midst, one such problem.

Schools. Universities, and Adult Learning Eager for more effective approaches, schools, and other players in education may welcome voice input tutorial computer programs. We believe that experience will show that voice input is superior to keyboard input in learning software at all levels particularly in highly interactive adaptive learning units

3. Voice input needs to be fully integrated into many applications. We need word processors, for example, designed from the beginning with speech recognition in mind, for all functions. By tacking voice input onto products designed for typing and pointing, we do not gain the full potential of voice.

We have done some preliminary design on such a word processor intended initially for young children, as one component of a program to learn to write starting at about seven years. Market potential would appear to be very high for such a word processor. This software could grow as the child grows, introducing new features over many years.

Other projects involve using voice for the scheduling and electronic mail aspects of computer use. The SpeechActs project at Sun is one such example.

4. Voice input operating systems could be installed as an original component. Current personal computer operating systems are derived from the Alto and Star systems at Xerox Palo Alto Research Laboratories. An operating system designed from the beginning with voice input would look very different than current systems. It could be much friendlier, particularly for new users of the computer, and it could grow as the user matures in her or his use of the computer.

5. Systems may be designed without keyboards. Careful experimental studies are needed to compare the outcomes of voice and keyboard input, in learning with computers. Both learning and affective issues are important in such studies.


The possibilities discussed above raise the question: Is the keyboard a computer necessity? It may turn out that keyboards can be entirely replaced by voice input, for most computer uses. As with other suggestions in this paper, this can be determined only with experience with such systems. Speech recognition by the computer is still a rare component, and most applications have not been in learning. We believe that of all the new technologies of recent years voice input will have the greatest long-range effect on learning.

References ( APA style requires page numbers for journal articles. Some are missing from your references below. Volume and issue numbers are also required)

Bloom, B. (1984). The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring. Educational Researcher.

The Future of Learning - An Interview with Alfred Bork. Educom Review July/August 1999.

Bork, A. (2000). Learning Technology EDUCAUSE Review. 2000.



Bork, A. & Caftori N. (1999). Computers and Major Ethical Problems in our Society. Simulation.


Bork, A. (1999). Global Learning Society. Student Pugwash,


Bork, A. et al. (1994). The Irvine-Geneva Course Development System.


Bork, A.(1987) The Potential for Interactive Technology. Byte


Yankelovich, N.et al, Designing SpeechActs: Issues in Speech User Interfaces, http://groucho.www.media.mit.edu/people/groucho/papers/speechacts.html


Critical Reviews

Critic ZZ

Go to critic's comments.

Critic G

This is a potentially interesting paper, only at times it seems to assume the reader has an understanding when this might not be a reasonable assumption. So, for example, I do not quite see how voice input will help interactive tutorial learning, over the use of a keyboard - though I would not deny there will be benefits. It could seriously be argued that writing something has greater potential for being remembered and inwardly digested over speaking. The deliberation of typing might even add to that potential. Voice could just as easily lead to that flicking from page to page that the mouse tempts the learner to do in a www environment. I do not follow very well the uses specified in Future Work and Problems 2 - these need more unpacking for a reader to see how voice input will help.

Agreed that more research is required, but still one might speculate on certain downsides: noise/interference in a learning center or computer room with all students speaking at once? Problems when I have a sore throat? Then there are upsides like less tendonitis!

And just when we thought computers were bringing back the arts of writing and good spelling...

I think the connection of speech recognition to improved potential for learning needs greater development in this piece. Also at times it seems that speech recognition is being conflated with simple audio transmission by computer, though the two may go hand in hand to good effect.

So needs tightening up in its arguments, more thoughtfulness and explanation of consequences/effects/advantages.

Critic V

This piece starts off with a bang but ends with a whimper. It seems to be moving in too many directions, but two stand out: (1) voice recognition (VR) as the key to dynamic, intelligent human-machine interaction, a la HAL; and (2) the pedagogical potentials of VR. I'm intrigued by the first and interested in the second. I'd like to see this paper revised with a single focus. As is, the case for the human-machine interaction idea is weak; perhaps this focus should be saved for a future submission. The case for the second is much more promising. However, I'd recommend adding (A) more detailed information relating features of specific VR software that appear to be promising for particular educational uses and (B) specific recommendations to VR developers re features that educators would consider invaluable.