Preservation Philosophy
HathiTrust is guided by principles of trustworthiness, openness and responsible stewardship. We provide reliable long-term preservation for digital content, with access to the extent legally possible, in ways that maximize the contributions of partner institutions and make the most efficient use of available resources.
Preserved Content
HathiTrust is committed to preserving the intellectual content and in many cases the exact appearance of materials that have been digitized for deposit. This includes:
- Digital representations (images) of content as the content appeared in its original form, with the same layout and color (e.g., for illustrations and artwork), and in the same order
- Textual representations of content where possible through Optical Character Recognition technologies
Preservation Strategies
HathiTrust employs a number of strategies to ensure the long-term integrity of deposited materials. These include:
- Use of standard and open content formats that meet community-accepted digital preservation standards, are widely supported on a number of platforms, and that we are confident can be preserved and migrated forward to new preservation formats over time
- HathiTrust currently relies on the extensive specifications of file formats, preservation metadata, and quality control methods that are detailed in our Technical Requirements for Digitized Page Images Submitted to HathiTrust.
- HathiTrust is committed to bit-level preservation and format migration of materials created according to these specifications as technology, standards, and best practices in the library community change.
- Formats preserved in HathiTrust include TIFF ITU G4 files stored at 600dpi, JPEG or JPEG2000 files stored at several resolutions ranging from 200dpi to 400dpi, Unicode text, and XML files with an accompanying DTD (typically METS).
- Rigorous validation of content on ingest; Reliance on standards for repository design and trustworthiness such as OAIS and TRAC (see HathiTrust Digital Library and Content Standards)
- Reliance on standards for metadata such as METS and PREMIS (see HathiTrust Digital Object Specifications)
- Regular checks on the integrity of stored content through
- Automated system checks that verify the integrity of digital objects with their ingested versions. These are performed on all files on a quarterly schedule
- User access, and
- Repository processes such as full-text indexing that use the content on a regular basis
Links for more information
- HathiTrust TRAC documentation - HathiTrust was certified as a Trustworthy Digital Repository by the Center for Research Libraries in March 2010.
- Technological profile
- “From Ingest To Access: A Day In The Life Of A HathiTrust Digital Object” - Repository workflow diagram [PDF] and Notes [PDF]
- “Building a Future by Preserving Our Past: The Preservation Infrastructure of HathiTrust Digital Library” - Paper [PDF] and Presentation [PPT]
- HathiTrust is a Solution: The Foundations of a Disaster Recovery Plan for the Shared Digital Repository [PDF]