ACM - Computers in Entertainment

What USC Shoah Foundation's Archive Birthed

By Marty Perlmutter

What USC Shoah Foundation's Archive Birthed

After Schindler’s List hit theaters in 1993, Steven Spielberg’s phone started to ring. Holocaust survivors reached out offering to tell their stories. When the calls became a tsunami, Spielberg decided on a long-lasting commitment: He would spend whatever it took to capture their tales. Ultimately it cost roughly $300,000,000 to shoot 234,979 videotapes of 51,686 interviews (a number that now approaches 54,000) and securely store them. The video recording began in 1994. Swiftly the managers of the Shoah Foundation realized they were going to have to take steps to preserve the vital collection. In addition they had to make the gigantic database searchable. Sam Gustman was the CTO at Shoah Foundation, a technologist with a background in Geographic Information Systems who understood how to make gigantic data clusters searchable by attaching keywords, latitude and longitude and other markers. So the vast work of the Shoah Foundation to create a fully searchable database of Holocaust stories took form. Alongside that was the matter of preservation. 

Videotapes shot in 1994 were expected to show age by 2014. What was the right medium in which to preserve this trove? It would cost $120 million to duplicate the collection on film, $20 million on videotape. In 2000, the Joint Photographic Experts Committee announced a new, lossless compression scheme: Jpeg2000 became the standard for digital conservation of files, guaranteeing interoperability and conserving all detail of the original recording. The cost for capturing all the Terabytes in the Shoah Foundation’s collection was a modest $8 million. Here was a path to the future.

It was decided to build two systems, one for file conversion and one for preservation. By 2006 12,000 videotapes had detectable damage. The Shoah Foundation came up with a path for addressing and correcting the damage. In the hands of skilled technologists and a dedicated team of 60 archivists who pored over every frame, the most extreme loss of image became correctable. Video that ranged from a single damaged head (yielding rapid flashes of incoherent noise) to what appeared to be total static became readable and visually perfect. This was a work of years.

A wealth of educational possibilities was apparent from the start. The foundation began outreach to K-12 schools in 2000. To house the digital archive and maximize its outreach the Foundation determined in 2006 to move from their base at Universal Studios (Spielberg’s creative home) to the University of Southern California. At USC, the challenging work of preserving the archive continued as the number of genocide stories multiplied beyond the Holocaust. Today the archive includes testimonies from Cambodia, Rwanda, Armenia, Darfur and Guatemala, among others.

Sam Gustman undertook development of USC’s libraries’ digital archive alongside the Shoah Foundation’s. The USC data center has two copies of all files and the entire collection is mirrored at Clemson University in South Carolina and in Prague. There are now four copies of every digital file.

The archive aims to “touch every tape” (i.e., each Linear Tape Open 8-Terabyte cartridge) every six months, automatically. A robot continuously reaches for LTOs, loads them into a reader, meticulously checksums/analyzes playback for the most minute data perturbation. If there is any deviation from perfect playback, that piece of storage is immediately trashed. No media is saved for more than three years and every piece of the archive is looked at every six months.

No wonder Warner Brothers Pictures came to USC Shoah Foundation when they wanted long term secure storage of their digital archive. It turns out that preservation in the digital domain is all about migration, continuously checking and moving data.

The seventh fastest supercomputer in academia is part of this system. Operating at 750 Teraflops, in the words of Gustman “it treats Petabytes like Terabytes” rooting through 100,000 hours of video each month, looking for the tiniest error.

Extreme Networks and EMC Isilon servers blast data across a 100 Gigabit/sec network between storage nodes. This network dwarfs the capabilities of the system serving Sony’s lot in Culver City. At Sony, moving the 850 Tb of the average feature project takes 9 weeks. At USC Shoah Foundation, it’s the work of a few days.

The Shoah Foundation seems to have long-term preservation covered. So what? Who gets to use it?

Shoah Foundation’s IWitness website-building tools permit K-12 students to access the entire database (at compressed quality), searchable in 44 languages. The students are educated in “ethical editing” procedures, then permitted to assemble a product that tells whatever story grabs their interest.

The Foundation is moving beyond the web we know. In Skokie, IL, where a large population of Holocaust survivors lives, there’s a museum. Working with USC’s Institute for Contemporary Technology, a hot bed of innovation in immersive technology, Shoah has built a display that features voice recognition, instant branching and what appears to be a 3-dimensional representation of Holocaust survivors who answer any of 1,200 likely queries from visitors.

A person visiting the Skokie Holocaust Museum will soon be able to stand before a survivor and ask, “Where did you hide?” “What camp were you in?” “How did you get food?” “Who else in your family was in the camp?” and so on and on.

Meanwhile, in Iron Mountain’s storage facility in the hills of Pennsylvania, an acre of BetaCam tapes repose. They were delivered by 19 fully loaded tractor-trailers, each carrying 15,000 tapes. They sit in temperature-controlled caves, their rot slowed by constancy in the environment. But rot they will. Meanwhile, in LA, at Clemson University and in Prague, their digitized content is accessed securely and instantaneously, stored in 4 distinct data formats including uncompressed Jpeg2000. That content is paired with tools for assembly of educational presentations, and now for 3D rendering of interactive survivors who tell their story as they display the powers of the next generation of technology: Cognitive Computing.

With a deep bow to the past, the Shoah Foundation has pioneered durable methods for preserving memory while providing powerful means for using its lessons in the future.

Copyright © 2019. All Rights Reserved