Theodor H. Nelson
This paper was presented at the International Conference on online interactive computing held at Brunei University, Uxbridge, England, September 4-7,1972.
Abstract
Bush was right. His famous article is, however, generally misinterpreted, for it has little to do with "information retrieval" as prosecuted today. Bush rejected indexing and discussed instead new forms of interwoven documents.
It is possible that Bush's vision will be fulfilled substantially as he saw it, and that information retrieval systems of the kinds now popular will see less use than anticipated.
As the technological base has changed, we must recast his thesis slightly, and regard Bush's "memex" as three things: the personal presentation, editing and file console; a digital feeder network for the delivery of documents in full-text digital form; and new types of documents, or hypertexts, which are especially worth receiving and sending in this manner.
In addition, we also consider a likely design for specialist hypertexts, and discuss problems of their publication.
Beating Around the Bush
Twenty-seven years ago, in a widely acclaimed article, Vannevar Bush made certain predictions about the way we of the future would handle written information (1). We are not yet doing so. Yet the Bush article is often cited as the historical beginning, or as a technological watershed, of the field of information retrieval. It is frequently cited without interpretation (2,3). Although some commentators have said its predictions were improbable (4), in general its precepts have been ignored by acclamation.
In this paper, an effort in counter-discipleship, I hope to remind readers of what Bush did and did not say, and point out what is not yet recognized: that much of what he predicted is possible now; the memex is here; the "trails" he spoke of—suitably generalized, and now called hypertexts—may, and should, become the principal publishing form of the future.
In July of 1945 an article entitled "As We May Think," by Vannevar Bush, was published in the Atlantic Monthly. It bristled with technical references but was actually fairly candid and simple.
It predicted many things. Bush, as director of Roosevelt's wartime Office of Scientific Research and Development, had seen the new ways in which technologies could be combined. In the urbane paragraphs of this article, Bush predicted a variety of useful future machines, including improvements in photography, facsimile systems, computers and miscellany. Depending on how you read it, he predicted, as well as you could hope, devices closely related to the Polaroid camera, the Xerox machine, computer transformation of mathematical expressions, and the telephone company's ESS switching system.
But the article is best remembered for its description of the new ways that scientists and scholars could handle and share their ideas, writing, reading and filing in a magical system at their desks. The system is the famous "memex."
The memex will hold all the writings its master wants to read, and he can read them easily.
Moreover, he can compare and annotate them.
Not only ordinary documents need be held in the memex. The user may make connections between different parts of the things stored. He does this by
By this associative technique he may create "trails," new documentary objects that are useful in new ways.
These new structures, or trails, may be taken and given to other people.
And they may be published.
It is strange that "As We May Think" has been taken so to heart in the field of information retrieval, since it runs counter to virtually all work being pursued under the name of information retrieval today. Such systems are principally concerned either with indexing conventional documents by content, or with somehow representing that content in a way that can be mechanically searched and deciphered.
This is indeed paradoxical. On the one hand, Bush did not think well of indexing.
On the other hand, with regard to content retrieval. Bush merely hinted about the use of structured-data representations and calculi in storing ideas (105, col. 2, para. 3), and did not plainly relate them to his mail exposition. The reason is plain: his real emphasis was on linkage, an£ new structures and activities that the automatic link-jump would mak( possible. While we might argue scripture about such matters, the fact L that Bush's most extensive concern has had few successors in the fielf called "information retrieval."
Transposition
The memex was to be a single screen console for handling the user's notes, writing and correspondence, for reading books and other writ¬ ings created by others, and for creating new associative text structures, which may in their turn be read and distributed. All this I take to be the heart of Bush's prediction. This will happen. Such systems exist; they are approaching cost feasibility; and the world is readier than it thinks.
Bush's machine will not, of course, be built exactly as he foresaw. The complete description, which I omitted, involves microfilm cas¬ settes, a photographic copying plate for adding new images to the microfilm file, and a telautograph stylus. Other machines he describes, such as the forehead camera and the direct-dictation typing machine, might or might not have been coupled to it as well. In the revised version of the article (5) his emphasis shifted to videotape. These impedimenta we ignore. There exist microfilm cassettes, copiers and so on, but the hardware ready to support a memex-class system will be something else.
The system will be built from existing computer equipment and peripherals. Physically it will be a computer display, with a keyboard, at the user's desk; a support computer system (at the desk or elsewhere) for handling various technical chores; and a library network of digital feeder machines. The written materials, when not shown on the screen, will be stored and sent digitally, as telegraphic symbolg.’They will be sent back and forth among these systems automatically, as programmed in the various devices. The trails, or associative text structures—more generally called "hypertexts"—will be stored in coded form, along with, the more conventional documents in the various devices. The user will be billed automatically for the services and the delivery of copyrighted materials. The publisher, who maintains these copyrighted materials in the feeder machines, will be duly paid for their use.
Prototype units exist now. Appropriate console hardware can be purchased now for about $15,000.
Supply systems, however, are not quite ready. The best supply system may be a special-purpose computer or a general-purpose time¬ sharing pyitem; which is better is not clear. While costs of either are presently in thi tSAB or hundreds of thousands, they are coming down fast, and the use of a well-tailored system by many people at once should bring down the cost of such service down considerably. To name a figure arbitrarily, let us say that a service cost of $100 a month per user (exclusive of telephone lines and copyright) would be suffi¬ ciently low to draw many users. Such service at such a cost will surely be generally available between one and five years from now.
Various preoccupations have delayed us psychologically. We do not need direct dictation, optical scanning or the availability (sic) of vast libraries for such systems to be immediately practical and important to us.
The Console
We are speaking of a single console to handle notes, writing, much correspondence, much reading, and the creation of new kinds of texts. On it the user must be able to view, edit, file, and otherwise manipulate.
Let me now describe a system for various kinds of text handling, the Xanadu Parallel Textface™. XANADU™ is a system presently under development as a turnkey computer graphics console for non-technical users. A stand-alone system, it is built on a minicomputer with disk and / or cassette tape, vectoring display screen, and keyboard. A variety of unusual programming techniques permit various types of screen animation, as well as automatic retrieval and data-base editing in the undersystem; these automatically service different user front ends, faces or theaters. Foremost among these theaters is the Parallel Textface, a text system of some power and delicacy.
Many of its features exist in other text handling systems, notably that of Douglas C. Engelbart at Stanford Research Institute (6). The purpose of this description is to show parallelisms between memex and this general type of system, not to distinguish this system from its relatives.
I will speak of the Xanadu hypertext system, or Parallel Textface, simply as "the system," "the current system" or the like, to distinguish it from the memex. As its implementation is not yet complete, this description applies to the present specifications and not only the parts that are working. We will note some resemblances to the memex.
The user sits at a display screen with a typewriter keyboard, a light pen or other pointing tool, and various optional controls. With these he may read, explore, annotate, write and revise. Storage is of course digital rather than pictorial; the system may manipulate the words letter by letter, rather than as a single image.
In basic editorial operations, the system presents text materials on the screen; the user may command basic writing and editing actions by simple manipulations. Indeed, he may make these editorial changes tentatively, on copies or alternative versions of the material; and he may, at his option, have his actions recorded automatically in a cumu¬ lative editorial log, in case it is later necessary to retract any of them.
Bush did not really go into memex editorial operations, but of course there was that keyboard. It is interesting to note that the memex described in the revised version of Bush's article (5) kept a log of manipulations by the user. (95-6)
The system is generally geared to tentative and thoughtful opera¬ tion. The alternative versions and editorial log are the strongest ex¬ amples, but there are others. An editing command may be retracted if its results do not please. Another example: a section of text being tried in a new place is shown glowing more brightly, so you know its limits within the new setting.
The user of Bush's memex called its contents by means of a "code book" (107 passim )—but this too was actually stored in the system. From the code book the user was to choose contents for yiewing^with a single tap of the key," as mentioned earlier. In the current system it is possible lor the user to call something to the screen by picking the name from a screen menu or relational diagram with the light pen. Fie may do this not just with whole units, but also with subparts and different ver- v sions—separate copies—of documents being tentatively edited.
In the memex, the user could skip through a document at adjustable rates (107, col. 1, para. 5). In the current system he may zip forward through the text at whatever speed he chooses, watching it slide on the screen. (The necessity of smooth, incremental text motion for the CRT is not yet generally recognized.)
The memex user looked at several documents simultaneously through several screens or panels of the same screen (107, col. 1, para, 6). This simultaneous viewing will be possible for several documents on the Parallel Textface; indeed, explicit linkages between associated texts may be viewed on request. This is the notion of "parallel text," useful for commentaries, translations, intercomparisons of documents and much else. These facilities correspond rather well, I think, to Bush's "marginal notes and comments" between the "several projection posi¬ tions."
The memex user created annotations by hand, or links between the things that were being viewed simultaneously. This is possible on the current system, which allows the user to create links between text sections regardless of whether or not they are parts of the same unit or otherwise related. These may be between parallel texts, or among free- floating paragraphs (discrete or "free" hypertexts), or in virtually any other useful arrangement.
For any system of this kind, design problems arise in the richer operations, such as creating and modifying connective structures. It is taken for granted that the console must be easy to use. That is no design problem for a small set of operations, such as text editing. But the design of the overall file handling elements and actions is more complicated. There are two problems. The first is how to achieve the desired perfor¬ mance values on available equipment and accepted software setting. This becomes a tradeoff. The grander problem, though, is conceptual unification for the system's filing structure and conventions.
In the current system there is presently only one type of connector. This restriction nevertheless permits a system sufficient to support real hypertext experimentation. Complex or annotated couplings are pres¬ ently not defined. However, simple links are adequate for various possible forms of discrete-jump hypertexts, including Bush trails, and may in principle be extended to computer responsibility for link behav¬ iors and complex coupling maintenance.
Hypertexts
While Bush's term, "trail," represents a very useful concept, we must generalize it. Bush's interest in microfilm led to his idea of the trail having a sequence.
By "trail," Bush appears to have meant a sequence of document*^ document excerpts, and comment! upon them.
For [the user] runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items. (107, col. 2, para. 4)
This sequence would be established by making paired couplings.
When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined projected onto adjacent viewing positions.... The user taps a single key, and the items are permanently joined. (107, col. 2, para. 2)
Bush mentions two other types of trail. One is the "side trail," branching out from a main trail sequence.
Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item.
The other type of trail is the "skip trail," a subset of a main trail sequence that contains the highlights.
The historian, with a vast chronological account of a people, parallels it with a skip trail which stops only on the salient items, and can follow at any time contemporary trails which lead him all over civilization at a particular epoch. (108, col. 1, para. 1)
In Bush's trails, the user had no choices to make as he moved through the sequence of items, except at an intersection of trails. With computer storage, however, no sequence need be imposed on the material; and, instead of simply storing materials in their order of arrival or of being noticed, it will be possible to create overall structures of greater useful? complexity. These may have, for instance, patterns of branches in various directions. Such non-sequential or complex text structures we may call "hypertexts." (7)
"Hypertext" is the generic term; there are reasons, for which there Is no room here, to rule out such other candidate terms as "branching text," "graph-structured text," "complex text" and "tree text."
The best current definition of hypertext, over quite a broad range of types, is "text Structure that cannot be conveniently printed." This is not very specific profound, but it fits best.
As Bush pointed out in his own terms, we think in hypertext (106, col. 2). We have been speaking hypertext all our lives and never known it. It is usually only in writing that we must pick thoughts up and irrelevantly put them down in the sequence demanded by the printed word. Writing is a process of making the tree of thought into a picket fence.
Hypertext structures are varied. For instance, they may be free- branching with only one type of link and backing up; they may have modal links with different meanings in a free structure; or have modal links and repetitive structure.
Discrete-jump hypertexts are not the only kinds. There is, for example. Stretchtext™. This is continuously variable text which never leaves the screen, but changes by small increments on user demand, growing longer and more detailed by a few words at a time, as required. Other continuous types are possible.
Just as items may be coupled, whole hypertexts may be coupled into books, or one another. An example of the first would be a hypertext with annotations coupling into the Bible. Such multi-couplings involve bundles of pointers between the texts, possibly with type codes or annotations. (In the current system the non-annotated multicoupler is the canonical case between files.) They may also involve alternative versions, which there is no room to discuss here. The structuring of these coupler types is a continuing design task for hypertext systems.
The creator of hypertexts may allow the user various options of jumping or branching. These options can lead the user to further reading in any pattern the author wants to make available to him. The only constraints on the author are usefulness, clarity, and artfulness.
There must, of course, be ways the reader may see, and choose among, possible branches from an item in the text. This problem was | implicit in Bush's treatment. Since "any item can be joined into numer¬ ous trails" (107, col. 2, para. 3) there would have had to be some way of j showing the user these options and their meaning. This is the case in] general, and a standing aspect of hypertext system design.
Hypertexts may be casual rough notes, as described in Bush'l extended example of the Turkish bow and arrow; or they may at th@| other extreme be finished units, editorially completed and organized* Such finished units would have many of the same properties as binary writing: intentional assembly, attempted clarity and expository structure of enumerable "points," and an overall comprehensible pat¬ tern whose interrelated parts may be in some way remembered or visualized. Finally, the concept of authorship applies to hypertexts as much as it does to an ordinary book or article.
As with ordinary texts, too, the editorial properties and "feel" of hypertexts may be quite distinct and varied. For hypertexts these are of course largely unexplored. It is also very hard to anticipate their possible administrative and social settings, and this will greatly affect their character and the modes of their use.
The Transmission Network
A general transmission network will carry requested documents from libraries to users, new documents from users to libraries, and commu¬ nications and documents between users.
The network will consist of several computers or computer-like objects. In the user's own unit is digital logic, and possibly (as in our Xanadu) a small computer; if not, this unit is serviced or managed by a computer which stores the user's files and communicates with the library network. The user's requests for documents that are not avail¬ able locally go out to a library network. These requests are sorted out to the appropriate repository machine; the repository machine returns the document and a bill or fee schedule.
Various fees are logged up to the user. These will include various basic costs, such as membership in the system, rental of the terminal and hookup. Additional fees may include logged-in time, per-usage costs of s ' various facilities (such as average memory area occupied and quanti¬ ties of text moved), storage charges for materials kept locally, and royalties to copyright holders. It should be noted that various grades of service may also exist, in which the user gets faster service by claiming larger buffers and higher priorities, and pays for the privilege.
A1 though this may sound like a formidable prospect, in general and with polishing there is reason to hope that the real costs of such a system will compete favorably with the real costs of the forms of publication and libraries we now employ. (Of course, in such "real cost" we must include the library services supplied "free" by various levels of govern¬ ment, including municipal libraries, grants to universities and the indirect subsidy of publishers by low postal rates. It is not unthinkable that similar encouragement will come about for this form of publishing and libraryship.)
Various technical design issues exist. These involve the feeder computers and their forms of memory. These hierarchies of memory are fairly clear. They will generally include disk (for working areas and directories), magnetic card or data cell (for the corpus), and magnetic tape (for rarely-needed materials and safety copies). Immense solid- state memories will never take over completely unless they are cheaper.
A more difficult question is, what should the feeder computers be? Their job in this system is the lookup and shovelling of text, plus book¬ keeping. One school of thought holds that a true general-purpose time¬ sharing system is necessary; another, that the correct machine is a dedicated computer with rich interrupts and comparatively little arith¬ metic capacity. The third school would point out the special character of the work and lean toward special designs and special tradeoffs, which could be anything from associative memory to the use of delay¬ line machines.
Similar issues exist for memory software and directory systems. There are complex technical problems of index and search techniques, and methods of their cross-tabulation. But they can be handled in some way or other.
A warning is necessary here, however. This area of console support is the area where things are not yet ready. The prediction of economic feasibility in five years, an eon in the computer world, is not the same as feasibility now. By devoting a whole computer, disk and tape to each user, the problem of console support can now be solved, in a manner of speaking; but the general problem of interleaved I/O and file manage¬ ment, with the efficient sharing of facilities, is another problem entirely, and the one that must be solved to make this whole thing go.
Publication: Redesign of the Technical Literature
Bush regarded his new text structures as transmissible between indi¬ viduals, and publishable. The same is true of hypertext units, the generalized form of trails. I think it likely that once such systems are available, the creation of branching and complex text will become recognized as far more natural than the structures in which we now must write.
This will all follow naturally from the existence of consoles which permit multiple couplings between texts. Having created for personal use a hyper-document on one's console, it will seem only reasonable to press a button passing this on to a colleague in its hyper-form, without chopping and aligning it into conventional writing.
Various interesting possibilities follow. Private "journals" in a field may be started among co-workers merely by the pooling of their hyper¬ documents. The rental of memory space on magnetic cards is inconse¬ quential next to what have been the costs of printing and mailing.
When professional and technical societies become interested in sponsoring hyper-publication, one of the most straightforward ways to begin would be with the creation of society-sponsored review articles. These could be like the ordinary review articles sponsored by such societies, save that the review article would open directly into the various materials it was reviewing, and footnotes could be more extensive and slanted to different categories of reader. (Eigure 1.)
The next step is, of course, the creation of hyper-magazines or journals under the sponsorship and supervision of professional societ¬ ies. Here the problem of organization would seem to become thorny. But this collection could be much like the journals of today, except for the direct availability of previous literature, working papers and vari¬ ous odds and ends. It should be observed that any of the "documents" noted in the illustration can themselves be hypertexts. (Figure 2.)
In this magazine, an arbitrary conjecture, all material of the past year is considered "recent" and embraced in a common lookup struc¬ ture. New material enters the collection surreptitiously, at whatever time of day or night the editors release it; material one year old is formally expelled to a different file on the first of each month. (It may be just as accessible, but its nominal status changes, rather like that of a book taken off the "recent acquisitions" list of a library.) This maga¬ zine would hold most of what people were talking about in a field, and it would all be right there.
There are numerous technical complications to hypertext publish¬ ing. There is no room here to discuss the more esoteric technical ones, such as facilities, billing and copyright conventions, or possible techniques of encryption and validation to prevent pirating of works and to authenticate expeditionary versions.
How to Begin
How shall it begin?
Those contemplating massive retrieval systems commonly pre¬ sume that they must begin with some massive corpus all accessible. The Library of Congress is often mentioned. Even Bush supposes regret¬ fully (in the revised article, p. 100) that the personal system waits on large public establishments being automated first.
I do not believe this is so. It will be practicable and of considerable interest to begin on a small scale, having no grand corpus available. The grand corpus will come soon enough, as requests emerge. (We have a precedent: the prowess of University Microfilms, Inc. in rendering texts available to scholars in microfilm.)
The way to begin is to furnish supported consoles to small commu¬ nities of users: key members of a "small" discipline, or specialists among whose work there is close connection. Suitable groups might be "early Egyptologists," or just plain "everybody at Woods Hole."
Such communicants, having been assured as well as possible of privacy and fail-safe design, will be encouraged to use the consoles fully. From the outset they may keep all their notes, manuscripts, articles, and copies of outgoing correspondence, on the system.
The rest will follow. I am fairly sure of the predictions so far, at least in broad outline, but I am just as sure that the first generation of hypertext users will invent twice as much as has been descried and described so far.
Who will support these beginnings? We have a choice, at the outset, of universities, publishing companies, computer companies (including service bureaux), research organizations or the government. Any of these might take such an initiative. Though such an initiative would seem severally unthinkable, it somehow seems collectively plausible and, of course, historically inevitable. If you believe in manifest destiny.
Theodor H. Nelson
Endnotes
Vannevar Bush, “As We May Think." The Atlantic Monthly 176:1 (July, 1945), 101-108. References to "Bush" cite this article unless further specified, and all unexplained numbers and paginations in the text refer to it. Because the article is so tightly written and our interest in it so close, paginations are given to the column and paragraph.
Allen Kent, Textbook on Mechanized Information Retrieval , second edition. Wiley, 1966; pp. 7-9.
Joseph Becker and Robert M. Hayes, Information Storage and Retrieval: Tools, Elements, Theories. Wiley, 1963; p. 40.
John H. Wilson, "As We May Have Thought." In Progress in Information Science and Technology: Proceedings of the American Documentation Institute 1966 Annual Meeting. Adrianne Press, 1966, pp. 117-122.
Vannevar Bush, "Memex Revisited." In Vannevar Bush, Science Is Not Enough. William Morrow and Company, Inc., 1967, 75-101.
Douglas C. Engelbart, "Augmenting Human Intellect: A Conceptual Framework." AFOSR-3223 Summary Report, Stanford Research Institute, Menlo Park, California, October 1962. This expresses the philosophy, rather than the results, of the continuing project.
Theodor H. Nelson, "Getting It Out of Our System." In George Schechter (ed.). Information Retrieval: A Critical View. Thompson Book Company, 1967; pp. 191-210.