Robert Townsend: Google Books ... What's not to like?
[Mr. Townsend is a graduate student in history at George Mason University.]
The Google Books project promises to open up a vast amount of older literature, but a closer look at the material on the site raises real worries about how well it can fulfill that promise and what its real objectives might be.
Over the past three months I spent a fair amount of time on the site as part of a research project on the early history of the profession, and from a researcher’s point of view I have to say the results were deeply disconcerting. Yes, the site offers up a number of hard-to-find works from the early 20th century with instant access to the text. And yes, for some books it offers a useful keyword search function for finding a reference that might not be in the index. But my experience suggests the project is falling far short of its central promise of exposing the literature of the world, and is instead piling mistake upon mistake with little evidence of basic quality control. The problems I encountered fit into three broad categories—the quality of the scans is decidedly mixed, the information about the books (the “metadata” in info-speak) is often erroneous, and the public domain is curiously restricted.
Poor Scan Quality
My reading of the materials was not scientific or comprehensive, by any means, but a significant number of the books I encountered included basic scanning errors. For instance, the site currently offers a version of the Report of the Committee of Ten from 1893 (the start of the great curriculum chase for the secondary schools). It offers a catalog of scanning errors, as Google has double-scanned pages (page 3 appears twice, for instance), pulled in pages improperly so they are now unreadable (page 147 between page 164 and 166), and cut off some pages (page 146, for example).
I’ve digitized a number of the AHA’s old publications and appreciate that scanners don’t always work as they should and pages can often get jammed. But even fairly rudimentary quality controls should catch those problems before they go live online. After years of implementing those kinds of quality checks here—precisely because friends in the library community took me to task about their necessity—I find it passing strange that so many libraries are joining in Google’s headlong rush to digitize without similar quality requirements....
Read entire article at AHA Blog
The Google Books project promises to open up a vast amount of older literature, but a closer look at the material on the site raises real worries about how well it can fulfill that promise and what its real objectives might be.
Over the past three months I spent a fair amount of time on the site as part of a research project on the early history of the profession, and from a researcher’s point of view I have to say the results were deeply disconcerting. Yes, the site offers up a number of hard-to-find works from the early 20th century with instant access to the text. And yes, for some books it offers a useful keyword search function for finding a reference that might not be in the index. But my experience suggests the project is falling far short of its central promise of exposing the literature of the world, and is instead piling mistake upon mistake with little evidence of basic quality control. The problems I encountered fit into three broad categories—the quality of the scans is decidedly mixed, the information about the books (the “metadata” in info-speak) is often erroneous, and the public domain is curiously restricted.
Poor Scan Quality
My reading of the materials was not scientific or comprehensive, by any means, but a significant number of the books I encountered included basic scanning errors. For instance, the site currently offers a version of the Report of the Committee of Ten from 1893 (the start of the great curriculum chase for the secondary schools). It offers a catalog of scanning errors, as Google has double-scanned pages (page 3 appears twice, for instance), pulled in pages improperly so they are now unreadable (page 147 between page 164 and 166), and cut off some pages (page 146, for example).
I’ve digitized a number of the AHA’s old publications and appreciate that scanners don’t always work as they should and pages can often get jammed. But even fairly rudimentary quality controls should catch those problems before they go live online. After years of implementing those kinds of quality checks here—precisely because friends in the library community took me to task about their necessity—I find it passing strange that so many libraries are joining in Google’s headlong rush to digitize without similar quality requirements....