The “hathifiles” are a standard metadata format we use at HathiTrust to distribute information about items in the HathiTrust collection. They include information derived from the bibliographic record (e.g., title, publisher, language, commonly used identifiers, etc.), rights and access codes, and information about the source of the item.
Hathifiles are made available as TSV files for the entire collection on the hathifiles page or as TSV and JSON formats for personal collections.
Many of the fields described below are extracted from the MARC record, a commonly-used format in library catalogs to describe a work. Information is provided below about which subfields are included in the data element. See “What is a MARC record, and why is it important?” to learn more about the MARC format.
Notes:
-
Multiple occurrences of OCLC number, ISBN, ISSN, or LCCN are comma-delimited within the appropriate data element.
-
If there is no corresponding data for a field, the field will be empty.
-
The fields are provided in the hathifiles in the order described below.
-
When new elements are added to the hathifiles, they are added to the end of the row.
Data element | Field name in header file | Description |
---|---|---|
Volume Identifier | htid |
This is the permanent HathiTrust item identifier. Each item identifier is unique. This identifier can be used to construct a persistent handle url or other link that directs users to the item. Handles can be constructed as follows: https://hdl.handle.net/2027/volume_identifier For example: https://hdl.handle.net/2027/mdp.39015013764785 |
Access | access |
An access code that describes whether or not users can view the item. The access code is derived from the rights attribute. Permitted values include:
Notes:
Also see “Rights” and “Access Profile” data elements below. |
Rights code | rights | A code (also referred to as “rights attribute”) that describes the copyright status, license or access. See the full list of codes. |
HathiTrust record number |
ht_bib_key |
HathiTrust's record number for the associated bibliographic record. HathiTrust record numbers are not permanent and can change over time. URLs to HathiTrust catalog records can be constructed as follows: https://catalog.hathitrust.org/Record/record_number For example: https://catalog.hathitrust.org/Record/001285647 |
Enumeration/Chronology |
description |
Enumeration (e.g., “vol.1”) and chronology (e.g., “1883”, “Jun-Oct 1927”) data for this item. |
Source |
source |
Code identifying the source of the bibliographic record. Currently, the NUC code of the originating library is used for the code. |
Source institution record number | source_bib_num | Local bibliographic record number used in the catalog of the library that contributed the item. |
OCLC numbers | oclc_num | OCLC number(s) for the bibliographic record. Multiple values are separated by a comma. |
ISBNs | isbn | ISBN(s) for the bibliographic record. Multiple values are separated by a comma. |
ISSNs | issn | ISSN(s) for the bibliographic record. Multiple values are separated by a comma. |
LCCNs | lccn | LCCN(s) for the bibliographic record. Multiple values are separated by a comma. |
Title | title |
The title of the work. May include an author if provided in the MARC field 245 $c. Includes all subfields of the 245 MARC field. |
Publishing information | imprint |
The name of the publisher and the date of publication. Includes subfieds b and c of the 260 MARC field. |
Rights determination reason code | rights_reason_code | This code describes how the “Rights” code was set. See the full list of Reason Codes. |
Date of last update | rights_timestamp |
This date may change when any of the following activities occur:
|
Government Document | us_gov_doc_flag |
United States federal government document indicator. Permitted values include:
|
Publication Date | rights_date_used | Derived publication date of the item. The date is derived from data provided in the 008 field of the MARC record and the enumeration/chronology data for the item. In cases where the date of the item could not be easily determined by HathiTrust processes, the date will be listed in the hathifiles as 9999. |
Publication Place | pub_place | The place of publication for the work. The codes included in this data element were originally provided in bytes 15-17 of the 008 MARC field. See the full list of country codes in the “MARC Code List for Countries.” |
Language | lang | The primary language of the work. The codes included in this data element were originally provided in bytes 35-37 of the 008 MARC field. See the full list of language codes in the “MARC code list for Languages.” |
Bibliograhic Format | bib_fmt |
Bibliographic format of the work. Definitions of format values can be found on the Library of Congress website Permitted values include:
|
Collection Code | collection_code | An administrative code used to share information between Zephir and HathiTrust repository.* |
Content Provider Code | content_provider_code | The institution that originally contributed the content. Codes used are listed at https://www.hathitrust.org/institution_identifiers.* |
Responsible Entity Code | responsible_entity_code | The institution that took responsibility for accessioning the content into HathiTrust, in cases where the content provider was not a member of HathiTrust. Codes used are listed at https://www.hathitrust.org/institution_identifiers.* |
Digitization Source | digitization_agent_code | The organization that digitized the content. Codes used are listed at https://www.hathitrust.org/rights_database#Sources.* |
Access profile *ADDED 7/1/2018* |
access_profile_code |
Access profiles indicate whether an item has view or download restrictions. They work in combination with the rights codes (included in the hathifiles in data element “rights”) to determine user access. Permitted values include:
|
Author
|
author |
The name of the person, company or meeting that created the work. Author names are typically in authorized format, meaning that the name is provided in a standardized form used across multiple catalogs and databases. Includes the following fields from the MARC record:
|
*For more information about codes used in HathiTrust internal processes, see the page at https://www.hathitrust.org/internal_codes.