November 5, 2010
Functional Objectives – Short-term
- Page turner mechanism: HathiTrust supports an application for reading, downloading, and interacting with (e.g., zooming and rotating) texts and images in HathiTrust. The page turner application interfaces with mechanisms such as the Rights Database and Shibboleth (a mechanism for inter-institutional authentication) to provide appropriate access to materials, and integrates with services such as the Collection Builder, full text search, and the bibliographic catalog.
- Branding (overall initiative; individual libraries): HathiTrust supports branding in the repository in a number of ways:
- The pageturner prominently identifies the HathiTrust initiative;
- A watermark on every page identifies the digitizing agent; and
- A watermark on every page identifies the source library of the print material.
- The source of the print material is included in our feed of bibliographic identifiers so that institutions can import or update records with this information.
- The pageturner contains institution-specific branding, identifying to users at partners institutions that their institution is a member of HathiTrust.
- Format validation, migration and error-checking: Format validation and error-checking is performed for all content that enters HathiTrust. Although, to date, no migration of content has been necessary to date, we believe that we have mitigated this need by choosing rich, flexible, standards-based formats. HathiTrust stores a variety of technical and digital preservation metadata along with each object in order to aid in migration should it become necessary. Strategies are in place to ensure and validate the integrity of HathiTrust materials on an ongoing basis.
- Development of APIs that will allow partner libraries to access information and integrate it into local systems individually: Several APIs have been released for this purpose. Two key examples are a bibliographic API (Bib API), which supports lookup and catalog integration, and a data API (Data API), which provides machine access to the underlying data in a digital object. Information on all modes of content and metadata distribution (including OAI and tab-delimited metadata files) can be found at http://www.hathitrust.org/data.
- Access mechanisms for persons with disabilities: HathiTrust has deployed an accessible interface that uses descriptive labeling, key tabs, and other strategies to facilitate navigation and use by users with print disabilities (e.g., optimized for use with screen readers). HathiTrust has also deployed authorization mechanisms that permit users who are certified as having print disabilities to access the full text of public domain and in copyright volumes in HathiTrust. These mechanisms, which have been deployed at the University of Michigan, are sufficiently generalized to provide access at partner institutions pending agreement on entitlement attributes (to be used in connection with Shibboleth) and institutional policies. A CIC working group chaired by Mark Sandler has initiated work to help address these needs.
- Public ‘Discovery’ Interface for HathiTrust: HathiTrust released a temporary public version of a comprehensive bibliographic search application (i.e., a catalog) in April 2009 and has worked through a collective process to define a HathiTrust view in WorldCat. The WorldCat implementation of the HathiTrust catalog will be released as a pilot in November 2010.
- Ability to publish virtual collections: HathiTrust has created a Collection Builder application that permits individuals to create public (i.e., shared) and private collections. Collection Builder uses Shibboleth authentication for users at partner institutions, but also permits authentication through the University of Michigan “friend” system so that unaffiliated users can create and maintain collections.
- Mechanism for direct ingest of non-Google content: HathiTrust developed automated ingest mechanisms for book and journal content digitized by the Internet Archive in April 2010. A technical and policy framework for ingest of other digitized book and journal content (e.g., digitized by partner institutions) is being finalized currently. When this is complete, routine ingest of partner content will begin.
Functional Objectives – Long-term
- Compliance with required elements in the Trustworthy Repositories Audit and Certification (TRAC) criteria and checklist: The Center for Research Libraries is conducting an independent assessment of the HathiTrust repository, based largely on the Trusted Repositories Audit and Certification (TRAC) criteria. The assessment is targeted to be complete by the end of 2010. Information about HathiTrust's compliance with TRAC can be found at http://www.hathitrust.org/standards.
- Robust discovery mechanisms like full-text cross-repository searching: An initial implementation of full-text search of the entire repository was released on November 19, 2009. The launch of this service represented significant research and development, much of which is documented on the HathiTrust website at http://www.hathitrust.org/large_scale_search and http://www.hathitrust.org/blogs/large-scale-search.
- Development of an open service definition to make it possible for partner libraries to develop other secure access mechanisms and discovery tools: HathiTrust has created a number of APIs for this purpose, as well as a collaborative development environment for partners to improve existing, and develop new applications.
- Support for formats beyond books and journals: HathiTrust is investigating issues relating to the storage and delivery of electronic publications (in the ePub format in particular) and digital audio and image files (such as maps). Pilot projects in each of these areas are underway.
- Development of data mining tools for HathiTrust and use by HathiTrust of other analysis tools from other sources: HathiTrust has engaged multiple strategies to support data mining in HathiTrust:
1. Data Distribution: HathiTrust has made sample datasets available to researchers for computational processing and analysis. The purpose of the samples is to give researchers an idea of the structure of the repository ahead of broader distribution of the public domain in HathiTrust (planned for early 2011) and strategy 2 below.
2. SEASR integration: The SEASR development team is in the process of integrating SEASR into HathiTrust as a proof of concept.
3. HathiTrust Research Center: HathiTrust plans to create a Research Center equipped with a variety of tools and services to allow a broad variety of analyses on the repository corpus.