Available Indexes

Add new comment

('Hapax legomenon' means 'word which appears once'. I love the phrase.)A common strategy for finding mispellings is to pull all facets with count=1. In your case, this is also a good winnow for OCR-burps. If you OCR something twice, do you get the same OCR-burps? Why do this? Because you can build a large database of such burps, and you get a training set for a classifier! Once you have this, you can start attacking single-count facets. If nobody has done this before, it would make a good grad student project. Also, doing the OCR twice is a good experimental design.
You are browsing an archive of the HathiTrust website. In July 2023, we launched a new site at www.hathitrust.org.