How Easy Is It to Search the Complete Archives of the New York Times Online?





Mr. Padilla is an HNN intern.

In case you missed it, the New York Times announced in 2006 that its full catalog of articles would become available to the general public in an online archive. Being that the publication first came off the presses in 1851 that provides quite a cache of useful historical sources; a comprehensive set spanning more than 155 years, more than 13 million articles.  Whether you profess to be an academic or just an average Joe with a penchant for the past, the New York Times Archive can serve you well.

Access

For this review two different points of access to the contents of the New York Times Archive have been examined:  The first, and the one that the majority of people will most likely use, is the archive hosted by the Proquest Archiver system available through a pay-based membership at nytimes.com. Launched in September 2005, TimesSelect is a “premium” service intended to offer a number of benefits not included in free content on the website such as certain Op-Ed pieces, early Sunday Times, and for our particular interest, articles stored in its archive. TimesSelect is available both as a monthly and yearly subscription-based service. For $7.95 a month access is provided to 100 articles per month; for $49.95 to 1200 articles per year. If you choose not to upgrade your initial free membership to TimesSelect, articles can set you back $4.95 apiece. If you plan to use the archive fairly frequently, it makes the most sense to upgrade to TimesSelect. The great thing about this system is that it allows virtually everyone to view the contents of the archive for a relatively low price; the downside is that it does cost you money while there are other versions of the Archive hosted by Proquest that do not cost a penny.

Many universities subscribe to the Proquest database (ProQuest Historical Newspapers: The New York Times, 1851 - 2003). Access to this database is free at local and college libraries across the country, though of course some libraries do not subscribe to the service.  In research for this article I found that some small city libraries subscribe to the service and some large city libraries (like Oakland's) do not.

Searching the Archives

So how do they compare in terms of use and content provided?

The archive interface available at TimesSelect is simple and easy. After logging into the TimesSelect account and coming to the “Members Center” scrolling down takes you to the heading NYT Article Archive: 1851-Present and a choice between searching the date range 1851-1980 (which is content organized by Proquest) and 1980 to the present (which I suspect is provided by nytimes.com itself). Accordingly search methodology and extent of content provided is different between the two date ranges.

The date range 1851-1980 provides articles with accompanying charts, graphs, and photos (where permission allows) in PDF format. You must as a result have Adobe Acrobat Reader to view the articles. Date Range 1981-Present does not include articles with charts, graphs, and photos. It also differs in that its articles are in text-only format.

Search fields provided for the two are virtually identical to the other. Clicking on the “Advanced” search option releases optional search fields of “Headline,” “Author,” and a choice of date range strictures. A tab to the right of the main search box allows you to switch back and forth between date range 1851-1980 and 1981-Present.

Search strategies for 1851-1980 and 1981-Present are different. For 1981-Present the archive appears to support a form of phrase searching, that is a search that requires quotations around a specific term/topic such as “affirmative action.” Additionally, searches can be conducted by linking terms to either exclude or include in a search with the symbols + and – at the front of each search term. For example, a search might be conducted +Activists +Peace –San Francisco.

The archive spanning 1851-1980 supports more types of search methods. It allows you to search for spelling and word variation through a method known as truncation. For example, instead of separately searching political, politicize,  politics, politician, you could search politi* instead to gather the variations of the word in the body of articles. It's almost like using a thesaurus to call up related search fields during research that might not readily be apparent.

Another type of search method is Boolean logic, which you may have come across on other databases. Boolean searching, named after the nineteenth century mathmetician George Boole, who worked on logical ways to formulate precise queries, relies upon searches conducted with the facilitation of “operative” words between terms. Operative words include but are not limited to AND, OR, AND NOT. This system is quite effective. For instance, a specific search can be conducted on photojournalism AND war AND ethics to confine these related words to the same paragraph within a potential search so the terms are not scattered throughout the document lacking the relation that you seek in your research.

Overall the range in the New York Times Archive from 1851-1980 offers more content, pictures, graphs, and photos, along with more thorough and academically minded search abilities than the more recent range.

The Proquest Database available at libraries is more academic. That is to say that it is the most detailed collection of sources that you will find on the New York Times online. It supports truncation, search based on Boolean logic and operative connectors, and phrase searching. However, it goes further in expanding this field by offering the option of narrowing a search by way of 17 different types of information using search field syntax.  Search field syntax includes but is not limited to Abstract, Author, Citation and Abstract, Citation and Document Text, Date (Alpha), Date (Numeric) and is to be used in specific indexes among the eleven available in advanced mode from a pull down menu.  Specific details on this type of searching can be found here. This Database offers content with all photos, graphs, and pictures from 1851-2003. Its coverage stops there for the time being.

Content from both TimesSelect and the Proquest Database is solely available for personal and private use. Any use other than that requires express permission from the New York Times. For further more detailed information on what you can and cannot do with articles taken from the archive consult the New York Times website.

The question that begs to be asked at the end of this article is whether or not the bound version of the New York Times Index and microfilm as a physical medium of research has become obsolete as different online alternatives advance. The answer is yes and no.

Using the New York Times Index from 1886 I looked for articles on and/or related to the Knights of Labor. Following the 6 month split format  provided in the bound version, the search covered January 1, 1886 to June 30, 1886. The term  Knights of Labor in the text directed me to see the Labor heading which contained the subheading Knights of Labor. Including articles within and outside the subheading in the general labor category yielded 31 search results. Attempting to approximate as closely as possible the search methodology underlying the bound version of the New York Times Index, I turned toward the Proquest online database. Performing a search on Knights of Labor using phrase searching within the body of the texts in the time period above produced 396 search results. Narrowing this search by making it only scan document titles yielded 51 results. A search in the alphabeticaly organized 1975 bound New York Times Index on Chiang Kai-Shek yielded 21 results. The proquest database yielded  79 results based on searches of the body of the text and 4 on title alone.
 
The Proquest academic database clearly offers more results in any given search. It offers articles in an immediate  and profoundly easily accessible sense that the bound New York Times Index is incapable of. You type in your inquiry, the page loads, and hundreds of results are available at the click of a button. No sorting through microfilm, no manual loading, no waiting, no expensive printing fees; files can be saved to your computer to be reviewed any place you please.

Despite all of these positives however, the bound index still manages to contain references to articles that the online database misses. For instance, the search made upon the Knights of Labor misses the article referenced in the bound version as, "Homer Wagon Company's Refusal to Employ Knights of Labor." Jan. 7--3.  A similar phenomenon occurs when comparing search results on Chiang Kai-shek.
 
It is not as if the sources referred to in the bound index are not available in the online database. They are. Using the title of the article they can be located immediately, but for some reason they do not appear unless specifically searched for. For the serious historian who cannot afford to miss out on the "one" source it is apparent here, much to the disappointment of those who treasure convenience, that the library still must be frequented; that is, if you want to make absolutely sure that you see everything related to your topic.
 
The bound New York Times Index and microfilm additionaly cannot be so easily written off because of a recent United States Supreme Court Decision in a case brought by  freelance writers who had contributed to the New York Times in the 1990s. The suit accused the paper of copyright infringement upon their work. In 2001 the Court ruled in New York Times Company, INC., ET AL.  v. Jonathan Tasini, ET AL. in the writers' favor. As a result portions of the New York Times archive were removed and are prevented in the present from being included online. The response of the New York Times to the Supreme Court decision can be found here .
   
Has the bound version of the New York Times Index and the physical search of microfilm become obsolete as online-based alternatives advance in comprehensive development? No, they have not. Historians still need libraries. Issues around legality and index methodolgy still constrict certain resources. However the continued necessity of the library does not undermine the easy and powerful tool that databases like the online New York Times Archive provide to us all.


comments powered by Disqus

More Comments:


Beth Maser - 2/26/2007

Here is one argument for keeping the microfilm version of the NYT that NEITHER the print or online databases index. These two things come to mind...Advertisements and classified advertising. It is essential to keep the microfilm versions of the NYT for these reasons alone. I have specialized in manual newspaper searches for years and I cannot tell you how man times I have had to search various newspapers for specific advertisements for litigation, or have had to search past help wanted advertisements for certain types of jobs.

In this case, no index, print or online is helpful to me. I still have to search each page by hand.


Oscar Chamberlain - 2/22/2007

My thanks to Thomas and to all of the others here for the information. It's helped me materially.

I've also passed it on to our reference people (who, I am happy to say, have worked with the history department here on a number of projects).

A question: does any organization have an committment to doing or collecting reviews of data bases and their search engines? A single source for reviews such as this would be wonderful.


Alonzo Hamby - 2/19/2007

Both responses to this fine article are apt. I'll throw in an example from my own experience.

Try using New York Times Historical (Proquest) to look for "Franklin Roosevelt" during the year 1911. You come up with nothing. Just enter "Roosevelt" and you will be overwhelmed with many references to Theodore Roosevelt as well as a few to Franklin.
Enter "F.D. Roosevelt" (the way he is referred to in the newspaper text) and you will get a response. Look for "Roosevelt, Franklin D." in the paper index, and you will find him.
My own conclusion: the paper index is still essential. Don't let your library throw it out. And don't let your library stop its microfilm subscription either. The microfilm lets you see a whole page, provides context, and facilitates serendipitous discoveries. Proquest gives you an isolated article.
A final thought: It is unfortunate but true that many reference librarians have not the slightest idea about how historical research is actually conducted.


Edwin Moise - 2/19/2007

I believe the reason for the false negatives (cases in which an online search for a keyword will fail to turn up some articles from the New York Times in which the keyword actually appeared) is that the search engine does its searching in a database that was compiled by computer scanning of the articles. The type in the New York Times is often unclear, and computers do not scan it very accurately. If the computer scanning an article read "Chiang Kai-shek" as "Chlang Hai-shek" it will not report a hit when you scan for "Chiang Kai-shek". Your chances are better if you choose keywords that will have appeared more than once in any article on the subject you are looking for. The computer will report a hit if the keyword was scanned accurately in at least one of the places where it appeared. I have found articles containing the place-name "Peiping" that did not show up on a keyword search for "Peiping", by doing a keyword search for "China".


Sterling Fluharty - 2/19/2007

Thanks for the review. Thing can get even more complicated than you described. Some topics will be listed under multiple synonyms. The terminology for many racial/ethnic groups has changed over time. One of the nice things about the paper index is that it will often list cross-references. When you conduct searches online, the database does not notify you if there are cross-references available that would yield additional relevant results. These are just a couple of reasons why I think you are right about needing to still use the old index with the new technology.

History News Network