Top News
New HathiTrust Partners
HathiTrust is very pleased to welcome Allegheny College (view the full press release) and the University of Alabama as its newest partners.
Executive Director Search
by Brian Schottlaeder, The Audrey Geisel University Librarian, University of California, San Diego and Chair, HathiTrust Board of Directors
On behalf of the HathiTrust Board of Directors, I am pleased to announce that the search for the successor to John Wilkin is underway. The Executive Director position will offer the right individual a unique opportunity to help us advance our collective mission and strategic objectives.
The full description of the position, responsibilities, and desired qualifications is available at http://www.hathitrust.org/jobs_executive_director.
Applications should be made through the posting on the University of Michigan jobs site. Nominations may be sent to hathitrust-ed-nomination@umich.edu. Review of applications will begin September 30, 2013 and continue until the position is filled.
I encourage you to share this announcement widely and to think expansively about nominating qualified individuals. Your nomination need include no more than a name; we’ll do the rest!
Assistant Director
HathiTrust announces the appointment of Jeremy York as Assistant Director for HathiTrust. Jeremy began working for HathiTrust in 2008, a few months prior to its formal launch, and has been responsible for a broad range of coordinating activities among the partnership.
Government Documents Registry Online Focus Groups
HathiTrust is engaged in an initiative to create a metadata registry for the comprehensive corpus of US federal government documents produced from 1789 to the present. The registry will be available to everyone. (For more information on the project, visit the project page.)
As part of our efforts to define the functionality that will be needed for the registry, we will be holding a series of online focus groups. These sessions are open to anyone who is interested in providing feedback on ways that the registry might be used.
The focus groups will be held on:
- Date 1: 9/23 (Monday); 1-3 pm EST
- Date 2: 9/25 (Wednesday); 4-6 pm EST
- Date 3: 9/27 (Friday); 10-12 am EST
- Date 4: 10/1 (Tuesday); 2-4 pm EST
- Date 5: 10/2 (Wednesday); 12-2 pm EST
If you are interested in participating, please email valglenn@umich.edu by September 19th with your two preferred dates/times. If you are interested in giving feedback and are unable to attend any of the scheduled sessions, please contact Valerie Glenn at the above email address.
Ingest
Internet Archive
The University of Connecticut and the University of Illinois at Urbana-Champaign submitted bibliographic metadata for volumes digitized by the Internet Archive in preparation for deposit of the volumes into HathiTrust. HathiTrust corresponded with the Library of Congress, Pennsylvania State University, and the University of Maryland about future ingest of Internet Archive-digitized content.
Locally-Digitized
HathiTrust provided support to Texas A&M University and the University of the Illinois at Urbana-Champaign as they prepared to deposit locally-digitized content into HathiTrust. The University Press of Florida began to make arrangements to deposit backfile publications in HathiTrust on an open access basis, and the University of Pittsburgh renewed conversations about deposit of locally-digitized files.
In September, HathiTrust will begin development on two online validation services designed to help partners prepare locally-digitized content to HathiTrust specifications prior to deposit. The first is a web-based service to interactively validate single image files and is planned to be complete in September or early October. The second is conceived as a cloud storage-based service to validate entire volumes and is planned to be complete in October or November.
Projects
Bibliographic Data Management
The California Digital Library (CDL) team continued to work with staff at the University of Michigan to bring the current bibliographic management system at the University of Michigan and CDL’s new Zephir system into parity prior to beginning a parallel phase in which Zephir will shadow the current system over a period of several weeks. Institutions depositing content are contributing records both to the University of Michigan and CDL, and CDL will be working with the User Support Working Group to test new workflows for bibliographic record correction. Please see http://www.hathitrust.org/ingest_checklist for information about submitting records to HathiTrust. Any questions about Zephir or content ingest should be directed to feedback@issues.hathitrust.org.
Copyright Review
A summary of the determinations from HathiTrust copyright review activities in August is given below. See CRMS-US and CRMS-World for further information.
|
August |
Overall |
||
Public Domain Determinations |
All Determinations |
Public Domain Determinations |
All Determinations |
|
CRMS-US |
3,160 |
7,824 | 145,928 | 276,920 |
CRMS-World |
2,697 | 5,131 | 34,932 | 65,061 |
Total |
5,857 | 12,955 | 180,860 | 341,981 |
HathiTrust Research Center - Author Gender Metadata
Stacy Kowalczyk, Assistant Professor at Dominican University worked with Zong Peng of the HTRC technical team over the summer to identify author gender and make this information available through HTRC. They extracted 606,000 unique personal author strings looking at the nearly 3.2 million bibliographic records in the HTRC. Using the VIAF, census bureau data, and lists of names from several web based sources, the HTRC has a preliminary gender identification of approximately 80% of the public domain corpus (19% from VAIF). The initial findings show that 70% of all personal author name strings are male and 10% are female; the remaining 20% are yet to be identified. Dr. Kowalczyk and HTRC continue to improve the identification rate and verify and validate the initial gender identifications. The author gender information shows up as an attribute of a volume in the user’s HTRC workset.
mPach
Staff at the University of Michigan prepared presentations on mPach to be delivered at the RCDL’2013 and JATS-Con 2013 conferences. Staff refined workflows for enabling other institutions to use mPach and discussed challenges associated with rendering TeX that occurs within JATS articles. More information about the mPach project can be found at http://www.hathitrust.org/mpach.
Development Updates
HathiTrust institutions performed the following work related to applications and Web interfaces:
Analytics
Staff added event-tracking features to links in HathiTrust that make it possible to filter results in HathiTrust Analytics based on whether a user is logged in from a HathiTrust partner institution or a University of Michigan Friend Account. Information about HathiTrust’s policies on privacy and logging of user activity is available at http://www.hathitrust.org/privacy.
Data API
Staff released version 2 of the HathiTrust Data API. Documentation of the new version is available at http://www.hathitrust.org/data_api. Some of the differences from version 1 include the abilities to specify the formats of the resources returned and parameters such as the height, width, and size of images. Version 2 of the Data API has been configured to support retrieval of articles and article supplementary material in conjunction with mPach. Version 2 has new URL syntax and a version parameter is required. Version 1 of the Data API is scheduled to be taken out of service on November 1, 2013.
Full-text Search
Staff continued testing new high-performance storage for full-text search and developed a proof-of-concept process for integrating the new storage into daily indexing routines. Pricing for networking equipment to connect the storage to search indexing servers has been received and purchase is underway. Staff expect to install the new equipment and begin performance testing with live data in late September or early October.
Staff continued to test the Solr index’s grouping functionality as part of efforts to improve relevancy ranking of full-text search results. Staff also contributed an initial patch to Lucene to correct an issue with the ranking of long documents in the BM25 ranking algorithm.
PageTurner
Staff deployed a new robots.txt allowing search engines to crawl PageTurner and Collection Builder pages with a “noarchive” meta tag.
Outages
No outages were reported in August.
New Growth
As of September 1:
August | Overall | |
Boston College | 2 | 2,363 |
Columbia University | 0 | 65,033 |
Cornell University | 2,738 | 429,752 |
Duke University | 0 | 4,523 |
Harvard University | 0 | 236,069 |
Indiana University | 13 | 195,349 |
Library of Congress | 0 | 89,724 |
North Carolina State University | 0 | 3,196 |
Northwestern University | 939 | 36,420 |
New York Public Library | 1 | 288,357 |
Penn State University | 710 | 64,774 |
Princeton University | 0 | 251,705 |
Purdue University | 0 | 44,692 |
Universidad Complutense | 1 | 111,984 |
University of California | 12,084 | 3,407,326 |
The University of Chicago | 468 | 33,542 |
University of Florida | 5,518 | 7,586 |
University of Illinois | 5 | 111,134 |
University of Michigan | 3,310 | 4,653,823 |
University of Minnesota | 1,835 | 109,727 |
University of North Carolina, Chapel Hill | 434 | 17,022 |
University of Wisconsin | 5 | 555,815 |
University of Virginia | 0 | 50,817 |
Utah State | 0 | 117 |
Yale University | 0 | 23,678 |
Total | 28,063 | 10,794,528 |
Public Domain (~32%)
Total* | 21,915 | 3,452,123 |
* Includes volumes opened through copyright review and rights holder permissions
Summary of Issues Received by User Support
Issue Type | August | July |
Content | 344 | 322 |
Quality |
335 | 313 |
Collections |
6 | 8 |
Cataloging | 111 | 140 |
Access and Use | 183 | 190 |
Copyright |
120 | 125 |
Permissions |
4 | 8 |
Takedown |
1 | 2 |
Print on Demand |
0 | 0 |
Inter-library loan |
0 | 2 |
Full-PDF or e-copy requests |
21 | 16 |
Datasets |
4 | 1 |
Data Availability and APIs |
1 | 0 |
Reuse of content |
4 | 5 |
Web applications | 26 | 27 |
Functionality problems |
9 | 8 |
Problems with login specifically |
0 | 3 |
General Questions about Login |
3 | 2 |
Partners setting up login |
0 | 2 |
Usability issues |
1 | 2 |
Feature requests |
1 | 2 |
Partner Ingest | 8 | 5 |
General | 64 | 39 |
Partnership |
7 | 7 |
Infrastructure |
0 | 0 |
Miscellaneous |
57 | 32 |
Total | 736 | 723 |
Most Accessed Volumes
September Forecast
- Complete the development of ePub and PDF generation from JATS.
- Continue to explore improvements to relevancy ranking.
- Work on adding support for indexing of JATS articles.
Presentations
- Jeremy York, “HathiTrust: Key Concepts and Issues in Managing the Digital Archive”, ICPSR Summer Workshop, August 1, 2013.
You can follow HathiTrust on Twitter or Facebook, or subscribe to receive email updates (via Google Groups).