The first year of work building the HathiTrust US Government Documents Registry was primarily focused on research and analysis, following a strategy of focusing on development of processes such as relationship detection prior to making the Registry publicly available.
Project staff:
Valerie Glenn was hired as the Government Documents Registry Analyst, and she began work in April 2013. Several individuals in the University of Michigan Library Systems Office devoted time to assist with the Registry project, particularly Jon Rothman, Head.
Project planning:
Staff created several planning documents for registry development, including the project’s scope statement, a detailed timeline, and success criteria.
Metadata analysis:
Staff identified and analyzed an initial set of existing repositories and/or sources of US government documents metadata, documenting information such as scope, format, government agency/agencies covered, and completeness/comprehensiveness of metadata.
Related, bibliographic metadata for items marked as US federal government documents was acquired from the University of Michigan, the University of Minnesota, and the Committee on Institutional Cooperation. In response to an open call for catalog records, we received metadata from an additional 44 institutions. In order to analyze the initial three sets of non-de-duplicated records, a temporary place for storing metadata was developed.
Staff compiled a list of known federal government agencies from 1789-present and made it publicly available for comment. Information such as dates of operation, SuDoc classification (both the classification and also the dates that classification was used) were documented when available. Later, analysis was conducted on both VIAF and the Library of Congress’ Name Authority File and it was determined that it is not necessary to develop an additional authority file for the Registry.
Registry Framework:
Use cases were developed to aid in the creation of a Registry requirements document, and project staff held five focus groups to gain feedback on the use cases. Following the focus groups, use cases were edited and re-prioritized. A requirements document was drafted, and continually refined.
Project staff developed a data model for the registry, indicating the information that will be stored about an item and determining the minimum elements a registry record must include (title, agency OR SuDoc number, publication date). They also defined the types of relationships between both items and government agencies, and made the decision to focus on showing the relationships between items (ie, duplicate records, records for items in the same series) rather than agencies.
Relationship and Gap Detection:
A lot of energy has been devoted to relationship detection - identifying duplicate records, sibling members of a set, and parent-child hierarchy records among our current metadata holdings. Staff developed potential methods and strategies for identifying related records in the Registry, and conducted tests on the non-de-duplicated data. These methods continue to be refined, in order to reduce the number of records that will need to be reviewed manually.
It is acknowledged that there will be gaps in the Registry’s metadata. The identification of potential initial methods and strategies for identifying those gaps is underway.