Available Indexes

Creating a Registry of U.S. Federal Government Documents

 
Objectives: To create a metadata registry for the comprehensive corpus of US federal documents, including whether those documents are digitized.
 
Project scope description: The Registry is intended to include metadata for the comprehensive corpus of U.S. federal documents. This will include materials produced at U.S. government expense, in all formats, at the item level, from 1789 to the present. It may also include works such as grant-funded or contract work, declassified materials, individual pieces of legislation (bills), administrative publications, and/or numerical data sets.
 

Justification/Background

At the HathiTrust Constitutional Convention, October 2011, partners gave overwhelming support for a ballot proposal to provide “Expanded coverage & enhanced access to U.S. Government Documents.” The nature of the proposed work will be decided by a group process determined by the Board of Governors, but key elements include:

  • Facilitating “collective action to create a comprehensive digital corpus of U.S. federal publications including those issued by GPO and other federal agencies.”
  • Coordinating “operational plans and a business model to further and sustain coordinated digitization, ingest, and display of U.S. federal publications including those issued by GPO and other federal agencies.”
  • And “that HathiTrust develop a process to implement enhanced access protocols to fully realize the potential of a comprehensive corpus of U.S. federal publications including those issued by GPO and other federal agencies.”

Perhaps the most significant impediment to accomplishing the goal of creating a comprehensive corpus of US federal publications is the absence of a reliable inventory of items in the corpus. In discussions related to digitizing government documents, participants recognize that even the promising inductive strategy of relying on the catalogs of regional depository libraries falls short. Many or perhaps all regional depository libraries have not cataloged their collections comprehensively; records exist at the bibliographic level rather than the volume level (e.g., more than 7,000 volumes corresponded to five bibliographic records submitted by Michigan’s Law Library) ; many US federal government publications are cataloged as serials but are regarded by users and librarians as monographs. In short, the fundamental chaos inherent in this non-inventory has led informed individuals in digitization discussions to produce estimates that range from 1.8m to 2.2m volumes, and to estimate average volume page counts from 60 pages to over 300 pages (a total range with a difference of more than 500m pages). Moreover, the absence of an inventory makes impossible tasks like correlating the more than 400,000 documents currently in HathiTrust with the total corpus, and coordinating collective effort across a group of institutions.

Primary Audience

The library community, government agencies, other organizations and project staff

Purposes

  • Inform potential digitization projects

  • Inform potential deaccessioning projects

  • Inform other collection development and management projects, such as shared print initiatives

Secondary Audience

Researchers/General public

Purposes

  • Identify materials distributed by a given agency, for research purposes

  • Identify materials distributed on a given topic, or within a given time period

  • Identify volumes in a given series, and where they can be found

Constraints

  • The quality of metadata for US federal materials is inconsistent, which has the potential to lead to false matching and/or metadata duplication

  • Metadata for federal materials may not yet exist

Assumptions

  • The registry will include those publications distributed to Federal Depository Libraries as well as other materials created and/or distributed by GPO and other federal agencies

  • The registry will include materials in languages other than English

  • The registry will include materials that may be protected by copyright

  • Metadata will need to be created for some materials

The first phase of the registry project will focus on US federal documents distributed to Federal Depository Libraries. Future phases will address questions such as:

  • Should works such as grant-funded or contract work, declassified materials, individual pieces of legislation (bills), administrative publications, and/or data/data sets be included?

Project Updates

Timeframe

The project has been extended to three years. An updated outline of project milestones is given below:

Phase One - Planning: (April-October 2013)

Status

Define project scope and initial project plan

Completed

Identify initial sources of metadata

Completed

Develop initial list of agency names and variants

Completed

Determine metadata elements to include

Completed

Gather community input on proposed functionality


Summary of Focus Group Comments, Fall 2013

Use Cases, updated following Focus Group feedback

Completed

  

Phase Two - Analysis & Requirements - September 2013 - September 2014

 

Acquire metadata from external sources

In Process - ongoing

Analyze potential processes for determining duplication among metadata records

Completed

Analyze potential processes for determining the existence of gaps in the registry

Completed

Draft functional requirements

Completed
  

Phase Three - Development (August 2014-December 2015)

 

Develop processes for determining duplication among metadata records

Completed

Develop processes for determining the existence of gaps in the registry

Completed

Build the registry framework

Completed

Build User Interfaces

Completed

Test and evaluate internally

Completed

Provide a public alpha to gather external feedback on use of the registry

Completed

Incorporate feedback into production system

Completed
  

Phase Four - Production/Launch (January-July 2016)

 

Provide a beta for public use

Completed

Continue to incorporate feedback into the system

In process - ongoing

Build APIs / make data available for export

 On hold

Conduct assessment based on previously defined success criteria

Completed

Continue to refine processes for duplicate and gap detection

In process - ongoing

Develop ongoing processes for maintenance and updating

Completed

 

 

You are browsing an archive of the HathiTrust website. In July 2023, we launched a new site at www.hathitrust.org.