Roy Rosenzweig: Digital Archives Are a Gift of Wisdom to Be Used Wisely

Roy Rosenzweig, in the Chronicle for Higher Education (6-24-05):

[Mr. Rosenzweig is a professor of history and new media at George Mason University and director of the university's Center for History and New Media. He is co-author, with Daniel J. Cohen, of Digital History: A Guide to Preserving, Presenting, and Gathering the Past on the Web, scheduled to be published in the fall by the University of Pennsylvania Press.]

"What's the big deal?" was the grumpy question of a fellow participant in a workshop at the Library of Congress in the summer of 1996. The library was showing off its still very new digital archive, which it had dubbed American Memory. The workshop aimed to show how the Web-based repository of photographs, documents, newspapers, films, maps, and sounds could transform teaching. My colleague, who taught at a major research university, was unpersuaded. "I'd rather send students to the library," he announced.But to me, it was a big deal -- a very big deal -- and the answer to a problem I had been grappling with for more than 15 years. When I started teaching as a graduate student in the mid-1970s, I quickly learned that the best way to excite students about my field, history, was to involve them directly with the "stuff" of the past -- the primary sources -- and to show them, by asking them to do it, what it means to think like a historian. As a graduate-student instructor, that was pretty easy. After all, I was at another of those big research institutions (Harvard University) with one of the nation's greatest libraries. I could "send students to the library," and in a short walk from their dorms, they could find more primary sources than they could exhaust in a lifetime.

When I arrived at George Mason University in the fall of 1981 as an assistant professor, things suddenly became much harder. We had a very modest library in those days. And more problematic from the perspective of a 19th- and 20-century American historian, it was a very new library, with relatively few old books, journals, and magazines. I could "send students to the library," but they would not find the rich bodies of primary sources that Harvard had in abundance. A simple assignment asking them to compare advertisements in two popular magazines of the 1920s was out of the question, especially in an evening section of my survey course, filled with students who could not journey to more-distant libraries because of full-time jobs and family responsibilities.I now know that my experience was not unique but was shared by scholars in many different fields, at many different institutions. Since then, however, much has changed in the world of Web-based teaching: We have an array of new opportunities, but we also have new limitations that we haven't yet confronted.

I spent a lot of time in the 1980s devising less-than-satisfactory strategies to work around the constraints -- photocopying piles of documents myself and putting them on reserve, for example. But in the latter part of the decade, I began to glimpse a solution. I read in computer magazines about this new thing called the CD-ROM, which could hold thousands of pages of text as well as photographs, sound files, and (later) moving pictures. In the early 1990s, I joined with my friends Stephen Brier and Joshua Brown at the American Social History Project, based at the Graduate Center of the City University of New York, to produce, with the help of the Voyager Company, such a disk. When Who Built America? appeared in 1993, we promoted it with an enthusiasm that now seems quaint. We would hold up the silvery, thin disk and exclaim (often to incredulous audiences) that it contained: Five thousand pages of text! Seven hundred images! Four hours of oral history, music, and speeches! Forty-five minutes of film!

Actually, our enthusiasm was already becoming dated in 1993. That year brought a much more momentous development for the future of technology and teaching than the publication of our CD-ROM -- the appearance of Mosaic, the first easy-to-use graphic Web browser that ran on most standard computers. Between mid-1993 and mid-1995, the number of Web servers -- the computers that house Web sites -- jumped from 130 to 22,000.

Progress in the last 10 years has been nothing short of astonishing. The Library of Congress's American Memory project now presents more than nine million historical documents. The New York Public Library's Digital Gallery contains more than 300,000 images digitized from its extraordinary collections. PictureAustralia presents 770,000 images from 28 cultural agencies in that country; the International Dunhuang Project, a cross-national collaboration, serves up 100,000 digitized images of artifacts, manuscripts, and paintings from the trade routes of the Silk Road. Most dramatically, the search-engine behemoth Google has announced plans to digitize at least 15 million books. Hundreds of millions of federal, foundation, and corporate dollars have already gone into digitizing a startlingly large proportion of our cultural heritage, and more is to come.

That is about as dramatic a development in access to cultural resources in a single decade as any of us are likely to see in our lifetimes, and it has opened up enormously exciting possibilities for teachers not just of American history and culture but in numerous disciplines that have experienced similar transformations. To be sure, not everything will become digital (nor should it), but where we instructors once struggled with the scarcity of documents for our students to use, we now participate in what John F. McClymer, a historian at Assumption College, calls a "pedagogy of abundance." The developments in history are broadly illustrative of both the possibilities and the problems of that pedagogy...

...Perhaps less well recognized is that the same algorithmic procedures behind Google, combined with the direct access that the company (as well as Yahoo) offers to its data, open up more-advanced possibilities for sorting out good and bad information mathematically. For example, Dan Cohen, my colleague at the Center for History and New Media at George Mason, has developed H-Bot, the Automated Historical Fact Finder, which can answer historical questions like "When did Charles Darwin publish The Origin of Species?" with a surprising degree of accuracy, simply by querying Google and analyzing the results statistically.

But even the most refined statistical and mathematical tools are unlikely to be able to make the kind of qualitative judgments historians often need to make. A second set of more social mechanisms -- nascent forms of peer review -- help keep students away from the bogus documents and poor-quality archives they will inevitably encounter online. Just as the Web has spawned plenty of problematic history Web sites, it has also provided a platform for dozens of Web resources with the goal of steering people away from those sites. For example, Thomas Daccord, a high-school teacher at Noble & Greenough School, in Dedham, Mass., has created Best of History Web Sites. History Matters: The U.S. Survey Course on the Web (developed by the social-history project at CUNY and the new-media center at George Mason) annotates the 850 best Web sites in American history; a sibling, World History Matters, at George Mason, has begun to do the same in that field...

For example, the Thomson Corporation offers Eighteenth Century Collections Online, which includes "every significant English-language and foreign-language title printed in Great Britain during the 18th century" -- 33 million text-searchable pages and nearly 150,000 titles. "We own the 18th century," a Thomson official boasts. Those who want their own share must pay handsomely. A university with 18,000 students can spend more than half a million dollars to acquire the full collection, depending on discounts it receives and other pricing factors. Another extraordinary digital collection, ProQuest Historical Newspapers, contains the full runs of a number of major newspapers. One of my colleagues uses it for weekly primary-source assignments that I could only have dreamed about back in 1981. But a typical university will have to shell out the equivalent of an assistant professor's salary each year to pay for those digital newspapers.It seems churlish to complain about extraordinary resources that greatly enrich the possibilities for online research and teaching. Surely Thomson, ProQuest, and other businesses are entitled to recoup their multimillion-dollar investments in digitizing the past. But it still needs to be observed that not every college can pay the entry fee to this new digital world. Some may have to decide whether it is more important to have extraordinary digital resources or people to teach about them.

Thus we are in danger of reproducing the information divide of yesterday -- where the richest universities with the biggest physical libraries could offer students far better access to materials than other institutions. Of course there are powerful counters to commercialization, especially the support that public agencies and private foundations have provided for digitization and "open content," as well as the eclectic and energetic efforts of enthusiasts and scholars who continue to post primary sources out of a passion for their fields.

But even when students have equal access to online resources, they do not necessarily have equal ability to make effective use of the new, global resource. For many students, the abundance of primary sources can be more puzzling and disorienting than liberating and enlightening. Sam Wineburg, a cognitive psychologist who teaches at Stanford's School of Education, has spent 20 years observing classrooms and talking with both teachers and students about how students read (and misread) historical sources. As his research shows, instructors commonly overstate their ability to analyze primary sources, failing to recognize the challenges that thwart understanding.

In my field, what do students make of the tens of thousands of photographs from the Farm Security Administration put online by the Library of Congress? Most often they see such powerful sources as transparent reflections of a historical "reality"; not, as a historian would, as imperfect refractions -- ideological statements by reform-minded photographers who wanted to expose the poverty brought on by the Great Depression and advance the programs of the New Deal. In the resonant phrase of Randy Bass, a professor of English at Georgetown University and director of the university's Center for New Designs in Learning and Scholarship, the Web has for the first time put "the novice in the archive," giving access to people who were previously barred by the time and expense of getting to archives, or by the entrance requirements imposed by such collections. But still novices lack the skills for critically evaluating primary sources...

...For the moment, the danger for students venturing onto the Web is not that they will find either bogus letters or comic strips, but that they won't know how to "read" the vast number of valuable primary sources that they find. It remains to be seen whether we can create useful online aids that not only make information available, but assist users in learning to discriminate and analyze that information.

The larger lesson here is one that we should have learned over and over again in confronting new technology. The most difficult issues are economic, social, and cultural, not technological. The Web has given us a great gift -- an unparalleled global digital library and archive that is growing bigger every day. Our task now is to make sure that it remains accessible to all, and to turn the novices we have admitted to it into experts who can use it with intelligence and thoughtfulness. If we can succeed not just in democratizing access to materials like online historical evidence but also in helping students make sense of that evidence, that will be a very big deal.

