Blogs > Cliopatria > The Keyword Revolution

Dec 6, 2005

The Keyword Revolution




Over the last couple of weeks I've been learning how to play with Google Print. Although the Print database is certainly not exhaustive, I've been blown away by how many books that interest me--from both trade and academic publishers--are available for full-text searching. And I've been even more impressed by the interface. You can see full-page images of published material, with your keywords highlighted on the page.

Of course, to have access to this resource, you have to be somewhat savvy, because there is not yet a portal page on Google's site for searching books. If you don't already know it, you can tap the vast resources of Google Print in one of at least two ways:

(1) When searching at Google, begin your search string with the word"book" or"books" and then enter your query as usual. If Google Print has book pages that match your query, you should see about two or three"book results" listed above your search. (Example.) You can either click on the individual results or on the headline that sends you to all of your book results. (Example.)

From there you can search within particular books (check the sidebar of an individual result page), look at the index and table of contents for a book, and even scroll through about two or three pages around your result page. Once you are within Google Print, you can also"search all books" by using the form entry box located either at the top of the page or at the bottom. (Hat-tip: Search Engine Watch.)

(2) Another way to get into Google Print is to use this link and then enter your search at the top of the page. (Hat-tip: NT Gateway.)

The scholarly possibilities here are staggering. Google Print makes it possible, for instance, to search for published books that cite a certain book or article--a feature that was difficult to do before without access to some kind of citation-tracking database. Most of all, Google Print makes it possible to see whether there are books that mention a particular name or word, even in passing--something that was nearly impossible to do before.

For instance, if I want to see books that mention the Kentucky abolitionist"Cassius M. Clay," I just do this and get 76 hits. In the"real" world, as they say, I would have had to determine that those 76 books were relevant to Clay, find them on the shelf, and then hope that the book's author or editor had listed"Clay" in the index. All of this was possible before, of course, for academic journals and other kinds of periodical literature. And it was even possible in digital collections of historical books, like the Making of America site or the Samuel May Anti-Slavery Collection at Cornell. But with Google Print, the digital keyword revolution has truly arrived, and the end is not in sight.

What should we make of this revolution, and how revolutionary is it? In the latest issue of Perspectives, there's an article by Carlo Ginzburg considering that question. (There are also two fantastic articles on history blogging by my fellow Cliopatriarchs, Ralph Luker and Manan Ahmed.) Ginzburg argues persuasively that keyword searching in library catalogs is good for scholarship, primarily because"the computer multiplies the possibilities that an unforeseen fact will take us by surprise." (In the above search, for instance, I was surprised to see Clay mentioned in The Education of Henry Adams as one of the morose young man's diplomatic"masters." It turns out that Clay is listed in the index of my printed copy of Education, but I don't know that I would have looked there intentionally for a mention of Clay.)

Of course, that capacity for surprise is not limitless, because we must have some reason for entering in the keywords that we do, and usually our intuitions here are guided by our prior research or the work of others. But it is significant that keyword searches allow us to navigate through texts largely without the mediation of editors, authors, and publishers.

On the other hand, the excitement of surprise can be misleading to a researcher. The temptation when doing keyword searches is always to think that your results are more representative than they are. (This is something I've mused about before.) If I look in the printed index to a book and see one page listed for Clay out of 450 or 500, I can make a rough and ready judgment about how important he is in the context of that book. But when I look at a Google results page, I depend on Google's relevancy algorithms to make that determination for me, and it's easy to forget that when I'm looking at a long list of hits. (I can still tell in Google Print how many times a word appears in a book, and how many pages the book has, but the linearity and ephemerality of a results list can be seductive. It doesn't have the same weight in your hand that the actual book does, and perhaps, subconsciously, that actual, physical extension of the book in space helps our brains make determinations about proportionality and significance.) For all the virtues of keyword searching, then, this revolution warrants some careful reflection.

You can find such reflection in a recent article by David Bell in The New Republic on"The Bookless Future." (Full disclosure: Professor Bell is the incoming Director of Graduate Studies in the Johns Hopkins history department, where I am pursuing said graduate studies.) Unfortunately, and perhaps ironically,"The Bookless Future" is only available online to subscribers. But I found the full text by using Hopkins' institutional subscription to Lexis Nexis and strongly recommend it if you can find a copy.

The bulk of the article (if, following my musings above, it is not a category mistake to talk about the"bulk" of hypertext) wonders about the future of electronic books, and it canvasses several kinds of technology, currently in development, that will hopefully make electronic books easier to read. I think Bell is right that the only thing missing is a vehicle for text that is as optimal for reading as a printed book. The technology to scan full-page images of books and make them searchable is clearly already upon us; it won't be too much longer, I predict, before you can pay a fee and pull a book from Google Print onto your PDA or some other electronic device.

But Bell also expresses warranted concern about the deleterious effects these changes might have on the practice of reading.
The very nature of the computer presents a different problem. If physical discomfort discourages the reading of [online] texts sequentially, from start to finish, computers make it spectacularly easy to move through texts in other ways--in particular, by searching for particular pieces of information. Reading in this strategic, targeted manner can feel empowering. Instead of surrendering to the organizing logic of the book you are reading, you can approach it with your own questions and glean precisely what you want from it. You are the master, not some dead author. And this is precisely where the greatest dangers lie, because when reading, you should not be the master. Information is not knowledge; searching is not reading; and surrendering to the organizing logic of a book is, after all, the way one learns.

If my own experience is any guide,"search-driven" reading can make for depressingly sloppy scholarship. Recently, I decided to examine the way in which the radical eighteenth-century thinker d'Holbach discussed warfare. I could have read his book Universal Morality in the rare-book room of my university library, but I decided instead to download a copy (it took about two minutes). And then, faced with a text hundreds of pages long, instead of reading from start to finish, I searched for the words"war" and"peace." I found a great many juicy quotations, which I conveniently cut and pasted directly into my notes. But at the end, I had very little idea of why d'Holbach had written his book in the first place. If I had had to read the physical book, I could still have skimmed, cut, and pasted, but I would have been forced to confront the text as a whole at some basic level. The computer encouraged me to read in exactly the wrong way, leaving me with little but a series of disembodied passages.

This has often been my troubling experience as well: Henry Adams makes a great quip about Clay, for instance--as a teacher, Clay had"no equal though possibly some rivals." But having previously submitted myself to the organizing logic of Adams' book by reading it cover to cover, I know better than to take Adams' quips at face value. (Sure enough, according to an editor's footnote, Adams referred to Clay in private as a"noisy jackass.") I wonder, though, whether I'm as careful with books that I haven't read. The keyword revolution at least means that I need to be especially careful--I need to balance the subversive virtues of keyword search (the"surprise" of which Ginzburg speaks) with the virtues of"surrendering to the organizing logic of a book."

All of this got me wondering, though, about whether the dangers of"strategic, targeted" reading are really that new. After all, the printed index compiled by an author or editor presents the reader with the same potential for targeted reading, and it is the rare researcher who does not rely heavily on these indexes to quickly jump to parts of a book that are relevant to his or her research. (Here are threepapersonline that allude to the similarity between online and offline indexes.)

The index, like the codex, predates the printed book. According to Guglielmo Cavallo and Roger Chartier in their edited collection, The History of Reading in the West,

Even beyond its immediate derivation from the manuscript, the book--both before and after Gutenberg--and the manuscript were similar objects composed of sheets folded and gathered into quires and assembled within one binding or cover. It is thus hardly surprising that all the systems of reference that have somewhat hastily been credited to printing existed well before its invention. One of these was the use of signatures and catchwords to help assemble the pages in the right order. Other signalling devices aided reading: folios, columns, or lines might be numbered; the page could be divided up more visibly by the use of devices such as ornamented initials, rubrics and marginal letters; an analytical (rather than a simple spatial) relationship between the text and its glosses could be set up; different characters or different colours of ink could be used to distinguish between text and commentary. Thanks to its organization in quires and to its clear divisions, the codex, whether manuscript or printed, was easy to index. Concordances, alphabetical tables and systematic indexes were common practice even in the age of the manuscript, and it was in monastic scriptoria and stationers' workshops that these modes for the organization of written material were invented. Printers picked them up later. (p. 23)

And programmers picked them up even later. It would be an interesting research question to see (and maybe a medieval historian can correct me if this has already been done) whether the invention of the index in the age of manuscript provoked the same kinds of anxieties we feel today about targeted access to texts. One of the contributors to the Cavallo and Chartier volume, Jacqueline Hamesse, suggests that scholastic modes of reading were shaped in part by these innovations. Unlike monastic readers, scholastics could jump from page to page and cross-reference works without the same kind of intensive, devotional reading:

"Here we enter into a new world that suggests modern reading habits. After the pioneering labours of the Cistercians to organize the content of a manuscript, other aids appeared and flourished: the table of contents, the concept index, concordances of terms, alphabetically arranged analytical tables, summaries and abridgements. Even the great twelfth-century summae were abridged: they were admittedly easier to handle when reduced to a single volume. The abridgements were a pale reflection of the originals, however.

The rise of this new literary genre inevitably meant that reading was no longer direct: now a compiler served as an intermediary, and reading was filtered by selection. Reference to the book changed. Its contents were no longer studied for themselves with the aim of acquiring a certain wisdom, as Hugh of Saint Victor had recommended. Henceforth knowledge was primary, and it too precedence over everything else, even when it was fragmentary. Meditation gave way to utility in a profound shift of emphasis that completely changed the impact of reading.

Certain scholars are quite aware of the important role of these working tools for learning in the Middle Ages, but others have failed to grasp their influence among intellectuals. As any fourteenth-century inventory will show, florilegia, concordances and tables abounded, not only in the libraries of the religious Orders, but also in college and university libraries. Such compilations often replaced consultation and, a fortiori, direct reading of authors' works, and even though they constitute a second-tier literature, their sizeable role in the intellectual preparation of medieval men cannot be denied. Today we have such different methods for acquiring culture that it is difficult for us to comprehend that even the great writers of the age of scholasticism made use of these handy tools for easy access to documentation that was indispensable to their work. The large number of manuscripts that have come down to us bear witness to the use and dissemination of such compilations. (p. 110)

Of course, electronic keyword searching takes concordances to another level. But perhaps this is a good thing. The etymological roots of" concordance" are, after all, entangled with the roots of" concord," and it is sometimes good to introduce discordance into our readings of texts. If Ginzburg is right, then we have a real advantage over our scholastic forbears; unlike them, we don't have to rely on the compilations of other scholars, who might use indexes as a way to assert too much control over the text. But if Bell is right, then we also have a greater responsibility to handle that advantage with care, and to prevent our liberty from becoming license.

You can be the judge of whether I've done that here, because (in a burst of self-referentiality) I found the quotes from the Cavallo and Chartier book by using Google Print, and I've never read the whole thing. When bloggers advise readers to"read the whole thing," do they really mean it? And do we ever really follow that advice?

(Cross-posted at Mode for Caleb.)



comments powered by Disqus

More Comments:


Greg James Robinson - 5/20/2005

That said, there are most certainly different ways legitimately to use a history book. One is to read it for the whole, another is to read a specific chapter, and then there is using it as a research tool and seeking specific information (including picking it up to see if your own work is cited or if you are mentioned in the acknowledgements!) Suffice it to say, even when we are reading whole books, we may not be able to, or wish to, sit down and just enjoy them, for a number of reasons. A onetime colleague mine quoted a professor as saying that "A history book is not a novel". Part of the job of being a historian is developing the skill to go through books and pick out the important arguments. Of course, just as an index is a helpful took but must be used with caution, so too a computer can help, but must be used with caution.



Caleb McDaniel - 5/20/2005

That's an interesting point. There's only so much, of course, that a computer can do. Google's routines can tell me if the words I want are on a page, but it's not smart enough to know with certainty that this is the page I want. In other words, a computer can't read. (Even if it can understand some natural language commands, it's not really reading until it can become fluent enough in those commands to interpret natural language about which it has not been told.)


Caleb McDaniel - 5/20/2005

I think you're right: pillaging is an indispensable part of research, and I think the Bell article recognizes as much. The problem is not with this method of reading per se, since in order to deal with the wealth of information available, we have to learn how to skim and target our reading. The problem is if we allow this method of reading to become our only method of reading.

Really I think the problems raised by targeted reading are just as old as hermeneutics themselves. It's the old hermeneutical circle: to understand the part, you have to have some sense of the whole, but you can't understand the whole without understanding the parts. If we rely on targeted reading and pillaging exclusively, then basically we're losing one half of that circle.

Pillaging is a somewhat ominous metaphor, by the way! Hopefully we're not cutting Wagnerian swaths of destruction through the archives! ;-)


John H. Lederer - 5/19/2005

It is worth noting that the key concept in both programming and digital hardware design is to break the problem down into the smallest discrete units possible and then to assemble these to perform tasks.

It appears that, willy nilly, the computer is doing the same thing to written texts when subjected to indexing and searching.

There are obvious negatives to this, but I also suspect there may be some very large as yet unexplored positives.

The concept has been a very powerful one in programming, allowing, for instance, the use of a particularized command set, and the creation of reusable subroutines.


Oscar Chamberlain - 5/19/2005

I took my doctoral courses in the late 1980s and early 1990s, just before the computer revolution really hit. One of things I quickly realized was that I could not read the way I had when I was simply reading history for pleasure.

Instead, in my daily pursuit of specialized knowledge to obtain and prove my understanding of history and historiography, I ripped my way through articles and books for the precise information I needed. I usually tried to reshelve carefully, but when there was not time to be careful, I left piles and piles of books and journals strewn on library tables in the D's, E's, F's, J's, and K's.

I called it pillaging.

However, when I pillaged articles and books, even those with a good index, I was forced to skim and read sections of the material. Sometimes, in my curiosity- (or ADHD-) driven way, I would find myself reading all sorts of interesting material that I did not need that week but did not want to put down.

I think you are right, therefore, that googling and text searching one's way through material is not unprecedented. The mentality is not too different from my pillaging. However, I wonder if the technology makes it less likely for the student/researcher to get lured and even hooked by the "irrelevant" material. If that is the case, it would be, in all seriousness, a serious loss to the student's education.

Because over the long haul of scholarship, the tangents we go off on are as important to our learning as the focused research.

PS I believe firmly that the lack of time graduate students have to slow down and really read is one reason most historians don't write very well.