July 29, 2014
The HathiTrust Research Center (HTRC) is pleased to announce an exciting new project funded by the National Endowment for the Humanities. The NEH awarded $324,841 for “Exploring the Billions and Billions of Words in the HathiTrust Corpus with Bookworm: HathiTrust + Bookworm” (HT+BW) a two-year project that begins September 1, 2014 and will conclude August 31, 2016.
This project will be directed by J. Stephen Downie (Co-Director of the HTRC and Professor and Associate Dean of Research at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign) in collaboration with internal partners from the Illinois Informatics Institute (I3) and the University Library and external partners from Indiana University, Northeastern University, and Baylor College of Medicine.
For this project, the HTRC is partnering with the Cultural Observatory team that developed the Google Books Ngram Viewer together with Google. The goal of this collaboration is to implement a greatly enhanced open-source version of the Cultural Observatory’s “Bookworm”, a faceted text analysis and visualization tool used to track trends in the use of words and phrases over time. The HT+BW tool will assist scholars and their students in navigating the massive HT corpus by providing more powerful visualizations that incorporate multi-faceted “slicing and dicing” of the underlying data through an enhanced set of content-based and metadata-based features.
“The HathiTrust + Bookworm project will greatly enhance the value of HTRC for scholars,” said Downie, “by improving discovery, analysis, and exploration of their own research worksets as well as the entire HathiTrust corpus. The project itself reflects the quality of our collaborations both within HTRC and beyond, and I am especially impressed by the initiative taken by our GSLIS PhD student, Peter Organisciak, and our I3 colleague, Loretta Auvil, in working across departments and across institutions to bring this proposal to fruition.”
The HTRC is the official research arm of the HathiTrust, a repository that centrally collects image and text representations of library holdings digitized by the Google Books project and other mass-digitization efforts. Its mission is to contribute to the common good by collection, organizing, preserving, communicating, and sharing the record of human knowledge.