Edwin Black: Why is the US Holocaust Memorial Museum Claiming It's Difficult to Search the Bad Arolsen Archive When It's Not?
When probing the Holocaust, the horrific experiences of survivors, the listener melts. We all melt at the enormity of the horror. Tattoos always trump the arcane questions of technology. But professionals who study the Holocaust beyond the blood and bones of mass murder know information technology was an indispensable behind-the-scenes factor in the original crime. Seventy-five years after Adolf Hitler came to power, information technology is again an indispensable behind-the-scenes factor, this time in exposing the crime.
This brings the Holocaust community to the continuing controversy over providing survivors remote secure access terminals to the Bad Arolsen archives instead of making them travel to the United States Holocaust Memorial Museum in Washington, D.C. to obtain details of their incarceration and enslavement. The USHMM is refusing to share access with other Holocaust institutions and now claims it will begin “individualized research” for the estimated 150,000 survivors in America and perhaps many among a million worldwide, all of whom want answers today not tomorrow, and do so with an initial staff of 24 trained researchers. The Museum refuses to budge and is increasingly defensive on the persistent demands of Holocaust survivors and media inquiries about a seemingly obvious question.
During and after the January 17, 2008 USHMM press conference on the topic of the Bad Arolsen archival transfers, Museum executive director Sara Bloomfield made statements about the archival technology to the Jewish Telegraphic Agency (JTA), the 85-year-old, war-tested Jewish communal news service, known for its precision as well as its diligent, respected correspondents. Bloomfield’s remarks to the JTA reporter about why the Bad Arolsen files could not shared, it seems, amounted to a calculated misinformation effort to pretend such sharing was impossible. The opposite is true. Her remarks were implausible on their face, and completely contrary to the published facts.
The JTA report stated: Much of the material delivered to the museums on hard drives packed into suitcases is not yet digitally searchable; images of the documents and 50 million index cards that arrived between August and November of last year are in jpeg form. Converting those images to searchable files will take much time and millions of dollars, officials of the U.S. Holocaust museum said at a news conference last Thursday morning, before the meeting with survivor groups. "To make it machine-readable would take millions and millions," said Sara Bloomfield, the museum's director. "We don't have the time.” Instead, said Michael Haley Goldman, the director of the museum registry, the priority would be to answer survivor questions, with trained staffers searching through the material.
That raises the obvious question: if the files are not “searchable,” what will the trained staffers search? What has the staff at Bad Arolsen been searching for years? Answer: they, of course, won’t search jpegs of documents, because jpeg (Joint Picture Experts Group files) are mere picture images of documents which are not easily translated into raw text. Instead they will search the databases common to virtually all image management systems used by banks, historical archives and government repositories. All of Bad Arolsen’s jpegs are in fact indexed in some way in a relational database.
By typing key words such as name, birth city and birthdate, the database will provide candidates images of documents. Trial and error will narrow the jpeg documents eliminating those of similar name or circumstance until the right person is matched. This elimination can take moments or perhaps hours, depending upon the recollection and details punched into the database.
The exact details of the database systems were published as a Cutting Edge News exclusive report in August, 2007, after an exhaustive month-long international effort, including obtaining written descriptions from the technology chief at Bad Arolsen about his own systems, and consultation with numerous computer resource experts.
An August, 2007, Bad Arolsen written summary of its two data systems, obtained by Cutting Edge News follows:
CNI (Central Name Index)
The ITS Central Name index was originally a paper index file, sorted in an alphabetical-phonetical way that matches the requirements of the ITS work. It was built (in paper) from the early 1950´s to 1998 and contains in its physical form 42 million cards related to 17 million identities. Between 1998 and 2000 the complete CNI was digitized. To maintain the operational status of the Organisation during the 2 year scanning-work and the following digitisation process, the CNI was digitized in the same order than the paper file. An additional 8 million pieces of information and 0.5 million identities were added to the digital CNI since the year 2000. Due to the absence of a standardized format and because most of the original cards were hand-or typewriter written, only 8-10% of the images could be OCR´ed [Optical Character Read].
To search the CNI database, the operator types a name, a surname and a date of birth into the GUI-front end [Graphical User Interface]. He will be led to the first possible match for the name, surname and date of birth. From this point he can leaf manually through the images that lie physically “behind” the first image.
Search: Rosenbaum, Edwin, 12.03.1912
Result: Rosenbaum, Edgar, 03.03.1898 (Image w/metadata)
(manually) Leaf one forward: Rosenbaum, Edgar, 08.12.1911 (only image)
(manually) Leaf one forward: Rosenbaum, Eduard, 01.10.1899 (only image)
(manually) Leaf one forward: Rosenbaum, Eduard, 02.01.1911 (only image)
(manually) Leaf one forward: Rosenbaum, Edwin, 12.03.1912 (only image)
SIMS (Simple Image Management System)
The SIMS Database was developed after 2000 to digitize the physical documents located at the archives. For this, a different approach was chosen. The ITS search scanned original documents by name, surname and date of birth too, but they are ordered according to archival units and without the alphabetical-phonetical system.
The Incarceration Collection of the archives has been digitised and stands ready for export by the end of the month of August 2007. The majority of the archival units contained therein, like “CC Buchenwald Men”, “CC Dachau” etc. are fully indexed. Some other units (that are rarely used by the ITS) are 4% indexed (every 25th image). Lists are, at this point, not name-indexed.
Thus, while the CNI database is mainly an index file for tracing names and their possible different spellings, the SIMS-System gives access to the digitised documents containing relevant information about the individual.
When Bloomfield spoke to the JTA reporter, she knew that the document images were searchable not as “machine-readable jpegs” but as image components in a vast database. The database is now in XML universally readable, Internet-ready format. It can be transferred anywhere in a hard drive. More than that, Bloomfield knew when speaking to the JTA that Bad Arolsen data searches could be done by any authorized computer terminal anywhere in the world. In May 2007, an ITS letter on the subject explained that “Option 3” for data transfer was no transfer at all, but merely a simple and immediate remote access to its own databases—which could be completed within about three to four months rather than years. The Bad Arolsen statement obtained by The Cutting Edge News states:
IC/ITS choice for Database Transfer
In May 2007 the IC/ITS (International Commission for the International Tracing Service) met in Amsterdam to decide on the method of giving a copy to each member state desiring to receive one.
Three options were mentioned:
Option 1: Complete replication of Hard- and Software and the database for each member state.
Option 2: Export of the data in an standardized data format (XML), so each member state can easily import the data in their own data system.
Option 3: Access of the member states to the ITS Database via VPN or a similar technique.
The IC/ITS opted for option 2 and the ITS is now implementing this option.
Just as 24 USHMM staffers sitting in the Washington Museum can access those databases, trying to communicate with elderly victims across America via phone, fax and letter in a fashion bound to continue the legendary backlogs, so could a USHMM staffer or indeed other institution’s staffer sitting in New York, Florida or California. For example, a terminal could be set up at the Center for Jewish History in New York, the Jewish Division of the New York Public Library, the Center for Holocaust and Human Rights Education at Florida Atlantic University in Boca Raton, the spacious Sherman Library at Nova Southeastern University in Ft. Lauderdale, the Greater Palm Beach Jewish Federation or the American Judaism University in Los Angeles. If only one of those 24 USHMM staffers could be stationed in Brooklyn, Miami, Los Angeles, Detroit or the other locales where survivors are congregated, months could be reduced to minutes for every search.
Asked why the International Tracing Service and the USHMM did not chose not to place all Bad Arolsen files on the Internet, a Red Cross official with direct access to the decsionmaking processs replied, “Don’t ask me. Technically it will feasible to access these databases from anywhere in the world. We would just export to XML format. We could then support a virtually unlimited number of remote terminals. Member countries would not receive copies—just access. This option was not taken.”
comments powered by Disqus
- Five Things You Need to Know to be a Better Digital Preservationist
- Book on Losing British Generals Wins American History Prize
- Stanford scholar explores civil rights revolution's positive impact on the South's economy
- Harvard Historian Nancy Koehn on Amazon's Tentacular Reach
- Q&A with historian and author Nick Turse